Every yield number on StakingBoard refreshes hourly. People ask where it comes from. Here's the honest pipeline.
Three layers, not one
We don't pull a single API and republish. The pipeline has three layers:
- Raw observation — append-only
*_snapshotstables, one row per source per fetch. Volatile. Sources come and go without contaminating the next layer down. - Consensus history — append-only
*_consensus_snapshotstables, post-median. When two or three sources cover the same field, this is where they get reconciled. - Cache / capability — the cached fields you see on Asset / Pool / Chain / StakingProvider rows. The consensus job is the only writer. Adapter churn upstream is invisible here.
Adapters never write to layer 3. That's the architectural rule that lets us add or remove a data source without touching the public surface.
What feeds which field
For a typical chain — say, Solana — here's what lands in our DB on an hourly cycle:
- DeFiLlama
/yields/pools→ ~17K pool rows (across all chains, not just Solana). Each row hasapy_total,apy_base,apy_reward,tvl_usd,volume_24h,il_risk,outlier. Refreshed every hour. - DeFiLlama
/protocols→ protocol-level TVL and audit metadata. - Solana RPC + Stakewiz overlay → validator-level signals: vote credits, last vote slot, root slot, delinquent flag, software version, ASN concentration, skip rate.
- CoinGecko → SOL spot price, market cap, circulating supply.
- CommodityTrack → SOL price snapshot for the impermanent-loss simulator (independent from CoinGecko so we have two prices to median).
For a Cosmos-family chain (Cosmos Hub, Osmosis, Injective, Sei, Stride, …), we additionally hit:
- The chain's own LCD endpoint for staking params (
unbonding_time,slash_fraction_double_sign,min_commission_rate), mint params (inflation_rate_change,goal_bonded), and per-validator commission rates + jail status.
Every chain has its own adapter. The signals we surface match the chain's own terms — we don't flatten Cosmos unbonding_time and Polkadot slash_defer_duration into a fake universal column.
When sources disagree
Multi-source fields go through a consensus median:
- If 3 sources publish ETH's native staking APR (beaconcha.in, beaconchain.in alternative, internal RPC computation) and they say 3.0 / 3.1 / 3.05, we publish 3.05.
- If 2 sources disagree by more than a configurable threshold (currently ±15% relative), we flag the field for manual review and don't update the cache layer until it resolves. The page shows the previous value with a stale-after-N-hours marker.
- If only 1 source remains live, we publish it — but
apr_sourceis exposed on the API so anyone can verify.
We never blend curated baselines into the median. A source either has a live adapter writing layer-1 rows, or it doesn't. Honest staleness is preferable to invisible source-decay.
What you don't get
A few things we deliberately don't publish:
- Forward yield projections. We surface what's happening right now with explicit ISO 8601 timestamps. Anyone who claims to project forward 12 months on DeFi rates is either an oracle or selling something.
- Promoted positions. Pool ranking is a pure APY/TVL function on the displayed sort, not paid placement. The affiliate disclosure covers what's promoted (banner CTAs on outbound links), and that material is visually separated from data tables.
- Single-source numbers without provenance. Every numeric field has an
apr_as_oformetrics_updated_attimestamp. If a number on the page doesn't have a recent timestamp, that's a bug, not a feature.
The full data-source list lives in our llms.txt for AI-search citation; the structured methodology lives in docs/architecture/data-pipeline-three-layer.md in the public repo.