Publisher Quality Ranking
This document introduces a quality ranking system to ensure high-quality pricing data on Pythnet. The ranking system will use three metrics to evaluate each publisher's performance:
- Uptime (40% weight),
- Price Deviation (40% weight),
- Lack of Stalled Prices (20% weight).
The ranking will be calculated monthly to ensure that only the top-performing publishers remain permissioned for each price feed on Pythnet.
Important: Publishers with an uptime of at least 50% will be included in the ranking. If a publisher's uptime is less than 50%, then the deviation and the stalled score of the publisher will be 0 to reflect their ineligibility.
Publishers in Pythtest conformance who are not publishing on Pythnet and pass this uptime condition will also be ranked together with the Pythnet publishers for each symbol.
Checkout Publisher Rankings (opens in a new tab) on the main website to see the updated rank of the publishers.
Metrics Used for Ranking
The three metrics used for ranking are:
Uptime
This metric measures the percentage of time a publisher is available and actively publishing data. A higher numerical score indicates a higher uptime maintained for the price feed. Score range: 0-1.
Price Deviation
This metric measures deviations that occur between a publishers' price and the aggregate price, normalized by the aggregate confidence interval. A higher numerical score indicates a lower degree of price deviation. Score range: 0-1.
Lack of Stalled Prices
This metric penalizes publishers reporting the same value for the price of an asset for at least 100 consecutive slots because such instances indicate potential data staleness. Publishers with fewer stalled prices will get a higher score. Score range: 0-1.
Ranking Algorithm
Each metric is assigned a weight as mentioned above, and the score for each metric is calculated based on the publisher's performance. The scores from each metrics are aggregated with respect to their weights to get the final score for each publisher. The weight distribution is as follows:
- Uptime: 40%
- Price Deviation: 40%
- Lack of Stalled Prices: 20%
Publishers are then sorted based on their final scores with the highest score indicating the best performance. As mentioned earlier, the score for each metric range from 0 to 1, where 1 represents the best performance. Each publisher will also be assigned a rank based on their final scores, where lower ranks are assigned to publishers with higher scores indicating better performance.
Metric Calculations
This section provides a detailed breakdown of how each metric is calculated.
Uptime
Uptime measures a publisher's reliability and availability. If a publisher consistently provides data without interruptions, it indicates a high level of reliability. This is aligned with the current conformance testing, which checks the publisher's availability based on price publication within the number of slots as defined on each feed's "Max Slot Latency" which is available on each feed's web page and within each feed's metadata.
Uptime is given 40% weight in the final ranking.
Reason for Weight: Uptime is the most critical metric because consistent data availability is fundamental for ensuring a data feed's overall quality and reliability. A publisher that is frequently unable to publish for every set number of slots or unavailable would significantly disrupt the service, hence the high weight of 40%.
Price Deviation
This metric evaluates the deviations between the publisher's price and the aggregate price, normalized by the aggregate's confidence interval.
Where:
- is the total number of prices
- is the publisher's price at instance
- is the aggregate price at instance
- is the aggregate confidence interval at instance
It is calculated similarly to a z-score, where the aggregate price is the mean, and the aggregate confidence interval is the Standard Deviation. A higher deviation score indicates that the publisher's prices are significantly inconsistent with the aggregate prices, especially when these deviations exceed the confidence interval. The is calculated by ranking the among all publishers and expressing this rank as a percentage of the total number of publishers.
Price deviation is given 40% weight in the final ranking.
Reason for Weight: To maintain trust in the data published on a feed, it is crucial to ensure that reported prices are within a reasonable range of the aggregate price when adjusted for confidence intervals. Significant inconsistencies can undermine confidence in the published data, hence the weight of 40%.
Lack of Stalled Prices
This metric checks if the publisher is reporting the same price continuously for a specified duration. Repeated prices over an extended period can indicate data staleness or a problem with the data feed.
Where:
- is the total number of slots for the aggregate price.
- is the duration threshold for staleness in slot. The threshold is slots (or about seconds) for all the symbols but can change in the future on a per-symbol basis.
- is the price reported by the publisher at time .
- is an indicator function that returns 1 if the condition inside it is true and 0 otherwise.
- calculates the fraction of time periods where the price remains unchanged for T consecutive intervals.
- adjusts the raw stalled penalty multiplied by to a score out of 1, which penalizes higher staleness rates. It means that if a publisher publishes stalled prices more than 10% of the times, it will get the score of 0.
This metric is given 20% weight in the final ranking.
Reason for Weight: Staleness in data can mislead users into thinking the market conditions are unchanged, which can be detrimental in volatile market conditions. While important, measuring data staleness is deemed relatively less critical than evaluating uptime and price accuracy, hence a weight of 20%.