snowflake cloud data platform - What does the partition_depth_histogram actually show?

partition_depth_histogram is some kind of metric that can be read about here:

The definition reads:

A histogram depicting the distribution of overlap depth for each micro-partition in the table. The histogram contains buckets with widths:
0 to 16 with increments of 1.

For buckets larger than 16, increments of twice the width of the previous bucket (e.g. 32, 64, 128, …).

This leaves many questions:

Why is it always 17 buckets? What if we have more or fewer micro-partitions than 17?
What is the relation between micro-partitions and buckets in the histogram?
What do the numbers in the histogram mean?

To facilitate discussion and explanation, here is an example histogram and other metrics:

SYSTEM$CLUSTERING_INFORMATION('TEST2', '(COL1, COL3)')
{
"cluster_by_keys" : "LINEAR(COL1, COL3)",
"total_partition_count" : 1156,
"total_constant_partition_count" : 0,
"average_overlaps" : 117.5484,
"average_depth" : 64.0701,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 3,
"00003" : 3,
"00004" : 4,
"00005" : 6,
"00006" : 3,
"00007" : 5,
"00008" : 10,
"00009" : 5,
"00010" : 7,
"00011" : 6,
"00012" : 8,
"00013" : 8,
"00014" : 9,
"00015" : 8,
"00016" : 6,
"00032" : 98,
"00064" : 269,
"00128" : 698
},
"clustering_errors" : [ {
"timestamp" : "2023-04-03 17:50:42 +0000",
"error" : "(003325) Clustering service has been disabled.\n"
}
]
}

partition_depth_histogram is some kind of metric that can be read about here: https://docs.snowflake/en/sql-reference/functions/system_clustering_information

The definition reads:

A histogram depicting the distribution of overlap depth for each micro-partition in the table. The histogram contains buckets with widths:
0 to 16 with increments of 1.

For buckets larger than 16, increments of twice the width of the previous bucket (e.g. 32, 64, 128, …).

This leaves many questions:

Why is it always 17 buckets? What if we have more or fewer micro-partitions than 17?
What is the relation between micro-partitions and buckets in the histogram?
What do the numbers in the histogram mean?

To facilitate discussion and explanation, here is an example histogram and other metrics:

SYSTEM$CLUSTERING_INFORMATION('TEST2', '(COL1, COL3)')
{
"cluster_by_keys" : "LINEAR(COL1, COL3)",
"total_partition_count" : 1156,
"total_constant_partition_count" : 0,
"average_overlaps" : 117.5484,
"average_depth" : 64.0701,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 3,
"00003" : 3,
"00004" : 4,
"00005" : 6,
"00006" : 3,
"00007" : 5,
"00008" : 10,
"00009" : 5,
"00010" : 7,
"00011" : 6,
"00012" : 8,
"00013" : 8,
"00014" : 9,
"00015" : 8,
"00016" : 6,
"00032" : 98,
"00064" : 269,
"00128" : 698
},
"clustering_errors" : [ {
"timestamp" : "2023-04-03 17:50:42 +0000",
"error" : "(003325) Clustering service has been disabled.\n"
}
]
}

Share Improve this question edited Nov 16, 2024 at 12:51 Mark Rotteveel 110k230 gold badges156 silver badges225 bronze badges asked Nov 16, 2024 at 11:05 Sahand 8,40026 gold badges78 silver badges150 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default -1

The partition_depth_histogram in Snowflake's SYSTEM$CLUSTERING_INFORMATION function can be a bit confusing. I tried to break down your questions with examples:

1. Why is it always 17 buckets?

The number of buckets in the partition_depth_histogram is not always 17. It's actually a combination of two factors:

Fixed Buckets (0-16): The first 17 buckets (numbered from "00000" to "00016") represent a fixed range from 0 to 16 overlap depth with increments of 1. This provides detailed information about micro-partitions with very low overlap.

Dynamic Buckets (Larger than 16): For overlap depths exceeding 16, the buckets increase in size based on a doubling scheme. This means the next bucket would be "00032" (twice the size of the previous bucket) and so on. This approach efficiently represents the distribution of overlap depth for a wider range of values.

2. What is the relation between micro-partitions and buckets in the histogram?

The partition_depth_histogram doesn't directly map micro-partitions to specific buckets. Instead, it shows the number of micro-partitions that fall within a certain overlap depth range.

Bucket Value: Each bucket represents a specific overlap depth range. Value in the Bucket: The value associated with a bucket (e.g., 98 in "00032") indicates the number of micro-partitions in that specific overlap depth range.

3.What do the numbers in the histogram mean?

The numbers in the histogram represent the count of micro-partitions within a specific overlap depth range.

In your example:

There are 0 micro-partitions with an overlap depth of 0 ("00000"). There are 3 micro-partitions with an overlap depth between 2 and 3 ("00002"). There are 98 micro-partitions with an overlap depth between 32 and 64 ("00032"). There are 698 micro-partitions with an overlap depth exceeding 128 ("00128").

Programmer puzzle solving

snowflake cloud data platform - What does the partition_depth_histogram actually show? - Stack Overflow

1 Answer 1

Articles related to this article

comment list (0)