最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

How much cardinality can I have in my prometheus metrics? - Stack Overflow

matteradmin5PV0评论

I am trying to understand what are the limitations for ingesting custom metrics in prometheus.

I understand that each metrics is an active timeserie, for example that would be two series container_image_size{base=java, name=myapp} container_image_size{base=java, container=myotherapp}.

I understand that prometheus cannot deal with high cardinality, high cardinality is too many labels and metrics. What is "high"? How much is too high?

Thank you.

I am trying to understand what are the limitations for ingesting custom metrics in prometheus.

I understand that each metrics is an active timeserie, for example that would be two series container_image_size{base=java, name=myapp} container_image_size{base=java, container=myotherapp}.

I understand that prometheus cannot deal with high cardinality, high cardinality is too many labels and metrics. What is "high"? How much is too high?

Thank you.

Share Improve this question asked Nov 18, 2024 at 17:46 user5994461user5994461 7,2384 gold badges45 silver badges66 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

In my experience a prometheus server can easily handle in the order of 1000 servers sending 1000 metrics. That's about a million active time series.

A prometheus server can get to 10 million active metrics if you're willing to allocate the hardware for it and tune the collectors. Infrastructure metrics from the prometheus exporter are often on a 5s to 60s interval, whereas a 5 minutes interval may be sufficient for custom metrics. A small adjustment in interval (and retention period) has a massive impact on load/storage.

A typical server sends 1k to 10k metrics, depending on what collectors are enabled. A small server in the cloud is closer to 1k metrics. A large physical server is closer to 10k metrics. The most notable metrics is per-core CPU metrics which generate a ton of series on physical servers with 100+ cores nowadays, they are enabled out-of-the-box on the prometheus exporter.

As a rule of thumb. If you make an app that sends one thousand active series, it's fine, it's like another server to monitor.

If you make an app that sends one million active metrics, it's not fine, it's basically the full size of the infrastructure estate. It's gonna topple over the prometheus server.

A few million series is not a lot. That upper limit is a strong limitations on what prometheus can be used for in practice. Let's consider some examples:

  • disk_usage_per_volume(disk="/etc") -> fine, there are only a dozen disks or volumes on a machine
  • disk_usage_per_dir(dir="/home/username/subdir/...") -> not fine, it's easy gonna run into millions of directories
  • http_request(domain="example", status=200) -> fine, there are only a few domains, there are about a hundred HTTP status codes
  • http_request(domain="example", status=200, url="/order/cart/) -> not fine, too many URLs

Things that have high cardinality like web server requests typically go to a logging system like ElasticSearch, which is optimized to store individual messages with repeated fields.

Post a comment

comment list (0)

  1. No comments so far