I2P's netDb is a specialized distributed database, containing just two types of data - router contact information (RouterInfos) and destination contact information (LeaseSets). Each piece of data is signed by the appropriate party and verified by anyone who uses or stores it. In addition, the data has liveliness information within it, allowing irrelevant entries to be dropped, newer entries to replace older ones, and protection against certain classes of attack.
The netDb is distributed with a simple technique called "floodfill", where a subset of all routers, called "floodfill routers", maintains the distributed database.
When an I2P router wants to contact another router, they need to know some key pieces of data - all of which are bundled up and signed by the router into a structure called the "RouterInfo", which is distributed with the SHA256 of the router's identity as the key. The structure itself contains:
- The router's identity (an encryption key, a signing key, and a certificate)
- The contact addresses at which it can be reached
- When this was published
- A set of arbitrary text options
- The signature of the above, generated by the identity's signing key
The following text options, while not strictly required, are expected to be present:
(Capabilities flags - used to indicate floodfill participation, approximate bandwidth, and perceived reachability)
- f: Floodfill
- H: Hidden
- K: Under 12 KBps shared bandwidth
- L: 12 - 48 KBps shared bandwidth (default)
- M: 48 - 64 KBps shared bandwidth
- N: 64 - 128 KBps shared bandwidth
- O: 128 - 256 KBps shared bandwidth
- P: 256 - 2000 KBps shared bandwidth (as of release 0.9.20)
- R: Reachable
- U: Unreachable
- X: Over 2000 KBps shared bandwidth (as of release 0.9.20)
For compatibility with older routers, a router may publish multiple bandwidth letters, for example "PO".
- coreVersion (The core library version, always the same as the router version) (Never used, removed in release 0.9.24)
- netId = 2 (Basic network compatibility - A router will refuse to communicate with a peer having a different netId)
- router.version (Used to determine compatibility with newer features and messages)
- stat_uptime = 90m (Always sent as 90m, for compatibility with an older scheme where routers published their actual uptime, and only sent tunnel requests to peers whose uptime was more than 60m) (Unused since version 0.7.9, removed in release 0.9.24)
These values are used by other routers for basic decisions. Should we connect to this router? Should we attempt to route a tunnel through this router? The bandwidth capability flag, in particular, is used only to determine whether the router meets a minimum threshold for routing tunnels. Above the minimum threshold, the advertised bandwidth is not used or trusted anywhere in the router, except for display in the user interface and for debugging and network analysis.
Additional text options include a small number of statistics about the router's health, which are aggregated by sites such as stats.i2p for network performance analysis and debugging. These statistics were chosen to provide data crucial to the developers, such as tunnel build success rates, while balancing the need for such data with the side-effects that could result from revealing this data. Current statistics are limited to:
- Exploratory tunnel build success, reject, and timeout rates
- 1 hour average number of participating tunnels
Floodfill routers publish additional data on the number of entries in their network database.
The data published can be seen in the router's user interface, but is not used or trusted within the router. As the network has matured, we have gradually removed most of the published statistics to improve anonymity, and we plan to remove more in future releases.
As of release 0.9.24, routers may declare that they are part of a "family", operated by the same entity. Multiple routers in the same family will not be used in a single tunnel.
The family options are:
- family (The family name)
- family.key The signature type code of the family's Signing Public Key (in ASCII digits) concatenated with ':' concatenated with the Signing Public Key in base 64
- family.sig The signature of ((family name in UTF-8) concatenated with (32 byte router hash)) in base 64
RouterInfos have no set expiration time. Each router is free to maintain its own local policy to trade off the frequency of RouterInfo lookups with memory or disk usage. In the current implementation, there are the following general policies:
- There is no expiration during the first hour of uptime, as the persistent stored data may be old.
- There is no expiration if there are 25 or less RouterInfos.
- As the number of local RouterInfos grows, the expiration time shrinks, in an attempt to maintain a reasonable number RouterInfos. The expiration time with less than 120 routers is 72 hours, while expiration time with 300 routers is around 30 hours.
- RouterInfos containing SSU introducers expire in about an hour, as the introducer list expires in about that time.
- Floodfills use a short expiration time (1 hour) for all local RouterInfos, as valid RouterInfos will be frequently republished to them.
RouterInfo Persistent Storage
RouterInfos are periodically written to disk so that they are available after a restart.
It may be desirable to persistently store Meta LeaseSets with long expirations. This is implementation-dependent.
The second piece of data distributed in the netDb is a "LeaseSet" - documenting a group of tunnel entry points (leases) for a particular client destination. Each of these leases specify the following information:
- The tunnel gateway router (by specifying its identity)
- The tunnel ID on that router to send messages with (a 4 byte number)
- When that tunnel will expire.
The LeaseSet itself is stored in the netDb under the key derived from the SHA256 of the destination. One exception is for Encrypted LeaseSets (LS2), as of release 0.9.38. The SHA256 of the type byte (3) followed by the blinded public key is used for the DHT key, and then rotated as usual. See the Kademlia Closeness Metric section below.
In addition to these leases, the LeaseSet includes:
- The destination itself (an encryption key, a signing key and a certificate)
- Additional encryption public key: used for end-to-end encryption of garlic messages
- Additional signing public key: intended for LeaseSet revocation, but is currently unused.
- Signature of all the LeaseSet data, to make sure the Destination published the LeaseSet.
As of release 0.9.38, three new types of LeaseSets are defined; LeaseSet2, MetaLeaseSet, and EncryptedLeaseSet. See below.
A LeaseSet for a destination used only for outgoing connections is unpublished. It is never sent for publication to a floodfill router. "Client" tunnels, such as those for web browsing and IRC clients, are unpublished. Servers will still be able to send messages back to those unpublished destinations, because of I2NP storage messages.
A LeaseSet may be revoked by publishing a new LeaseSet with zero leases. Revocations must be signed by the additional signing key in the LeaseSet. Revocations are not fully implemented, and it is unclear if they have any practical use. This is the only planned use for that signing key, so it is currently unused.
As of release 0.9.38, floodfills support a new LeaseSet2 structure. This structure is very similar to the old LeaseSet structure, and serves the same purpose. The new structure provides the flexibility required to support new encryption types, multiple encryption types, options, offline signing keys, and other features. See proposal 123 for details.
Meta LeaseSet (LS2)
As of release 0.9.38, floodfills support a new Meta LeaseSet structure. This structure provides a tree-like structure in the DHT, to refer to other LeaseSets. Using Meta LeaseSets, a site may implement large multihomed services, where several different Destinations are used to provide a common service. The entries in a Meta LeaseSet are Destinations or other Meta LeaseSets, and may have long expirations, up to 18.2 hours. Using this facility, it should be possible to run hundreds or thousands of Destinations hosting a common service. See proposal 123 for details.
Encrypted LeaseSets (LS1)
This section describes the old, insecure method of encrypting LeaseSets using a fixed symmetric key. See below for the LS2 version of Encrypted LeaseSets.
In an encrypted LeaseSet, all Leases are encrypted with a separate key. The leases may only be decoded, and thus the destination may only be contacted, by those with the key. There is no flag or other direct indication that the LeaseSet is encrypted. Encrypted LeaseSets are not widely used, and it is a topic for future work to research whether the user interface and implementation of encrypted LeaseSets could be improved.
Encrypted LeaseSets (LS2)
As of release 0.9.38, floodfills support a new, EncryptedLeaseSet structure. The Destination is hidden, and only a blinded public key and an expiration are visible to the floodfill. Only those that have the full Destination may decrypt the structure. The structure is stored at a DHT location based on the hash of the blinded public key, not the hash of the Destination. See proposal 123 for details.
For regular LeaseSets, the expiration is the time of the latest expiration of its leases. For the new LeaseSet2 data structures, the expiration is specified in the header. For LeaseSet2, the expiration should match the latest expiration of its leases. For EncryptedLeaseSet and MetaLeaseSet, the expiration may vary, and maximum expiration may be enforced, to be determined.
LeaseSet Persistent Storage
No persistent storage of LeaseSet data is required, since they expire so quickly. Howewver, persistent storage of EncryptedLeaseSet and MetaLeaseSet data with long expirations may be advisable.
Encryption Key Selection (LS2)
LeaseSet2 may contain multiple encryption keys. The keys are in order of server preference, most-preferred first. Default client behavior is to select the first key with a supported encryption type. Clients may use other selection algorithms based on encryption support, relative performance, and other factors.
The netDb is decentralized, however you do need at
least one reference to a peer so that the integration process
ties you in. This is accomplished by "reseeding" your router with the RouterInfo
of an active peer - specifically, by retrieving their
file and storing it in your
netDb/ directory. Anyone can provide
you with those files - you can even provide them to others by exposing your own
netDb directory. To simplify the process,
volunteers publish their netDb directories (or a subset) on the regular (non-i2p) network,
and the URLs of these directories are hardcoded in I2P.
When the router starts up for the first time, it automatically fetches from
one of these URLs, selected at random.
The floodfill netDb is a simple distributed storage mechanism. The storage algorithm is simple: send the data to the closest peer that has advertised itself as a floodfill router. When the peer in the floodfill netDb receives a netDb store from a peer not in the floodfill netDb, they send it to a subset of the floodfill netDb-peers. The peers selected are the ones closest (according to the XOR-metric) to a specific key.
Determining who is part of the floodfill netDb is trivial - it is exposed in each router's published routerInfo as a capability.
Floodfills have no central authority and do not form a "consensus" - they only implement a simple DHT overlay.
Floodfill Router Opt-in
Unlike Tor, where the directory servers are hardcoded and trusted, and operated by known entities, the members of the I2P floodfill peer set need not be trusted, and change over time.
To increase reliability of the netDb, and minimize the impact of netDb traffic on a router, floodfill is automatically enabled only on routers that are configured with high bandwidth limits. Routers with high bandwidth limits (which must be manually configured, as the default is much lower) are presumed to be on lower-latency connections, and are more likely to be available 24/7. The current minimum share bandwidth for a floodfill router is 128 KBytes/sec.
In addition, a router must pass several additional tests for health (outbound message queue time, job lag, etc.) before floodfill operation is automatically enabled.
With the current rules for automatic opt-in, approximately 6% of the routers in the network are floodfill routers.
While some peers are manually configured to be floodfill, others are simply high-bandwidth routers who automatically volunteer when the number of floodfill peers drops below a threshold. This prevents any long-term network damage from losing most or all floodfills to an attack. In turn, these peers will un-floodfill themselves when there are too many floodfills outstanding.
Floodfill Router Roles
A floodfill router's only services that are in addition to those of non-floodfill routers are in accepting netDb stores and responding to netDb queries. Since they are generally high-bandwidth, they are more likely to participate in a high number of tunnels (i.e. be a "relay" for others), but this is not directly related to their distributed database services.
Kademlia Closeness Metric
The netDb uses a simple Kademlia-style XOR metric to determine closeness. To create a Kademlia key, the SHA256 hash of the RouterIdentity or Destination is computed. One exception is for Encrypted LeaseSets (LS2), as of release 0.9.38. The SHA256 of the type byte (3) followed by the blinded public key is used for the DHT key, and then rotated as usual.
A modification to this algorithm is done to increase the costs of Sybil attacks. Instead of the SHA256 hash of the key being looked up of stored, the SHA256 hash is taken of the 32-byte binary search key appended with the UTC date represented as an 8-byte ASCII string yyyyMMdd, i.e. SHA256(key + yyyyMMdd). This is called the "routing key", and it changes every day at midnight UTC. Only the search key is modified in this way, not the floodfill router hashes. The daily transformation of the DHT is sometimes called "keyspace rotation", although it isn't strictly a rotation.
Routing keys are never sent on-the-wire in any I2NP message, they are only used locally for determination of distance.
Storage, Verification, and Lookup Mechanics
RouterInfo Storage to Peers
LeaseSet Storage to Peers
I2NP DatabaseStoreMessages containing the local LeaseSet are periodically exchanged with peers by bundling them in a garlic message along with normal traffic from the related Destination. This allows an initial response, and later responses, to be sent to an appropriate Lease, without requiring any LeaseSet lookups, or requiring the communicating Destinations to have published LeaseSets at all.
The DatabaseStoreMessage should be sent to the floodfill that is closest to the current routing key for the RouterInfo or LeaseSet being stored. Currently, the closest floodfill is found by a search in the local database. Even if that floodfill is not actually closest, it will flood it "closer" by sending it to multiple other floodfills. This provides a high degree of fault-tolerance.
In traditional Kademlia, a peer would do a "find-closest" search before inserting an item in the DHT to the closest target. As the verify operation will tend to discover closer floodfills if they are present, a router will quickly improve its knowledge of the DHT "neighborhood" for the RouterInfo and LeaseSets it regularly publishes. While I2NP does not define a "find-closest" message, if it becomes necessary, a router may simply do an iterative search for a key with the least significant bit flipped (i.e. key ^ 0x01) until no closer peers are received in the DatabaseSearchReplyMessages. This ensures that the true closest peer will be found even if a more-distant peer had the netdb item.
RouterInfo Storage to Floodfills
A router publishes its own RouterInfo by directly connecting to a floodfill router and sending it a I2NP DatabaseStoreMessage with a nonzero Reply Token. The message is not end-to-end garlic encrypted, as this is a direct connection, so there are no intervening routers (and no need to hide this data anyway). The floodfill router replies with a I2NP DeliveryStatusMessage, with the Message ID set to the value of the Reply Token.
LeaseSet Storage to Floodfills
Storage of LeaseSets is much more sensitive than for RouterInfos, as a router must take care that the LeaseSet cannot be associated with the router.
A router publishes a local LeaseSet by sending a I2NP DatabaseStoreMessage with a nonzero Reply Token over an outbound client tunnel for that Destination. The message is end-to-end garlic encrypted using the Destination's Session Key Manager, to hide the message from the tunnel's outbound endpoint. The floodfill router replies with a I2NP DeliveryStatusMessage, with the Message ID set to the value of the Reply Token. This message is sent back to one of the client's inbound tunnels.
After a floodfill router receives a DatabaseStoreMessage containing a valid RouterInfo or LeaseSet which is newer than that previously stored in its local NetDb, it "floods" it. To flood a NetDb entry, it looks up several (currently 3) floodfill routers closest to the routing key of the NetDb entry. (The routing key is the SHA256 Hash of the RouterIdentity or Destination with the date (yyyyMMdd) appended.) By flooding to those closest to the key, not closest to itself, the floodfill ensures that the storage gets to the right place, even if the storing router did not have good knowledge of the DHT "neighborhood" for the routing key.
The floodfill then directly connects to each of those peers and sends it a I2NP DatabaseStoreMessage with a zero Reply Token. The message is not end-to-end garlic encrypted, as this is a direct connection, so there are no intervening routers (and no need to hide this data anyway). The other routers do not reply or re-flood, as the Reply Token is zero.
RouterInfo and LeaseSet Lookup
The I2NP DatabaseLookupMessage is used to request a netdb entry from a floodfill router. Lookups are sent out one of the router's outbound exploratory tunnels. The replies are specified to return via one of the router's inbound exploratory tunnels.
Lookups are generally sent to the two "good" (the connection doesn't fail) floodfill routers closest to the requested key, in parallel.
If the key is found locally by the floodfill router, it responds with a I2NP DatabaseStoreMessage. If the key is not found locally by the floodfill router, it responds with a I2NP DatabaseSearchReplyMessage containing a list of other floodfill routers close to the key.
LeaseSet lookups are garlic encrypted end-to-end as of release 0.9.5. RouterInfo lookups are not encrypted and thus are vulnerable to snooping by the outbound endpoint (OBEP) of the client tunnel. This is due to the expense of the ElGamal encryption. RouterInfo lookup encryption may be enabled in a future release.
As of release 0.9.7, replies to a LeaseSet lookup (a DatabaseStoreMessage or a DatabaseSearchReplyMessage) will be encrypted by