最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

python - Difference between "single" and "table" methods in pandas.DataFrame.quantile - Stac

matteradmin9PV0评论

I hope someone can help me understand the difference between the "single" and "table" methods in pandas.DataFrame.quantile?

Whether to compute quantiles per-column (‘single’) or over all columns (‘table’). .DataFrame.quantile.html

For example, the following code yields the same results.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=["a", "b"])

print(df.quantile(method="single", interpolation="nearest"))
print(df.quantile(method="table", interpolation="nearest"))
a      3
b    100
Name: 0.5, dtype: int64
a      3
b    100
Name: 0.5, dtype: int64

I hope someone can help me understand the difference between the "single" and "table" methods in pandas.DataFrame.quantile?

Whether to compute quantiles per-column (‘single’) or over all columns (‘table’). https://pandas.pydata./docs/reference/api/pandas.DataFrame.quantile.html

For example, the following code yields the same results.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=["a", "b"])

print(df.quantile(method="single", interpolation="nearest"))
print(df.quantile(method="table", interpolation="nearest"))
a      3
b    100
Name: 0.5, dtype: int64
a      3
b    100
Name: 0.5, dtype: int64
Share Improve this question asked Nov 17, 2024 at 11:51 HowardHoward 1351 silver badge9 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

If you invert the order of the values in b the result will be different:

df['b'] = df['b'].values[::-1]

print(df.quantile(method='single', interpolation='nearest'))
print(df.quantile(method='table', interpolation='nearest'))

Output:

a      3
b    100
Name: 0.5, dtype: int64

a     3
b    10
Name: 0.5, dtype: int64

The exact difference in behavior is a bit tricky to understand.

This happens in core/frame.py:

        if method == "single":
            res = data._mgr.quantile(qs=q, interpolation=interpolation)
        elif method == "table":
            valid_interpolation = {"nearest", "lower", "higher"}
            if interpolation not in valid_interpolation:
                raise ValueError(
                    f"Invalid interpolation: {interpolation}. "
                    f"Interpolation must be in {valid_interpolation}"
                )
            # handle degenerate case
            if len(data) == 0:
                if data.ndim == 2:
                    dtype = find_common_type(list(self.dtypes))
                else:
                    dtype = self.dtype
                return self._constructor([], index=q, columns=data.columns, dtype=dtype)

            q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)

            by = data.columns
            if len(by) > 1:
                keys = [data._get_label_or_level_values(x) for x in by]
                indexer = lexsort_indexer(keys)
            else:
                k = data._get_label_or_level_values(by[0])
                indexer = nargsort(k)

            res = data._mgr.take(indexer[q_idx], verify=False)
            res.axes[1] = q

        result = self._constructor_from_mgr(res, axes=res.axes)
        return result.__finalize__(self, method="quantile")

In short, if you use method='single', this computes:

np.percentile(a, 50, axis=0, method='nearest')
# array([  3, 100])

With method='table', this computes the quantile on the indexer (np.arange(len(df))). The result with be an integer (q_idx) between 0 and len(df). The, this sorts the rows with lexsort_indexer, giving priority to the first column(s), and finally takes the (q_idx)th row:

data = df
q = 0.5
interpolation = 'nearest'

q_idx = np.quantile(np.arange(len(data)), q, method=interpolation)
# 2

indexer = lexsort_indexer([data._get_label_or_level_values(x) for x in data.columns])
# array([0, 1, 2, 3])

df.iloc[indexer]
#    a    b
# 0  1  100
# 1  2  100
# 2  3   10   # this row will be picked
# 3  4    1

data.iloc[indexer[q_idx]]
# a     3
# b    10
# Name: 2, dtype: int64

This means that the result is dependent on the order of the columns:

print(df.quantile(method='table', interpolation='nearest'))

a     3
b    10
Name: 0.5, dtype: int64

# now let's give b priority over a
print(df.iloc[:, ::-1].quantile(method='table', interpolation='nearest'))

b    100
a      1
Name: 0.5, dtype: int64
Post a comment

comment list (0)

  1. No comments so far