美文网首页
Series第六讲 apply、groupby、window

Series第六讲 apply、groupby、window

作者: butters001 | 来源:发表于2020-09-21 16:25 被阅读0次

Series第六讲 apply、groupby、window

本节课将讲解pandas中如何应用apply、分组、窗口方法

apply、分组、窗口

  • Series.apply()
  • Series.agg()
  • Series.aggregate()
  • Series.transform()
  • Series.map()
  • Series.groupby()
  • Series.rolling()
  • Series.expanding()
  • Series.pipe()

详细介绍

先来创建一个Series

In [4]: s = pd.Series([1, 2, 3, None, 5, None, None, None, 9])                  

In [5]: s                                                                       
Out[5]: 
0    1.0
1    2.0
2    3.0
3    NaN
4    5.0
5    NaN
6    NaN
7    NaN
8    9.0
dtype: float64

1. Series.apply()

Series.apply(func, convert_dtype=True, args=(), **kwds)

对Series里的值调用func方法

常用参数介绍:
  • func:Python function or NumPy ufunc to apply 【Python方法或者Numpy方法】
  • convert_dtype:bool, default True 【是否对func的结果转换成更合适的dtype,如果False,保留dtype=object】
  • args:tuple 【一个元组,表示要传递给func的位置参数】
In [6]: s.apply(lambda x: x ** 2)                                               
Out[6]: 
0     1.0
1     4.0
2     9.0
3     NaN
4    25.0
5     NaN
6     NaN
7     NaN
8    81.0
dtype: float64


In [7]: def subtract_custom_value(x, custom_value): 
   ...:     return x - custom_value
In [8]: s.apply(subtract_custom_value, args=(5,))                               
Out[8]: 
0   -4.0
1   -3.0
2   -2.0
3    NaN
4    0.0
5    NaN
6    NaN
7    NaN
8    4.0
dtype: float64


In [9]: s.apply(np.log)                                                         
Out[9]: 
0    0.000000
1    0.693147
2    1.098612
3         NaN
4    1.609438
5         NaN
6         NaN
7         NaN
8    2.197225
dtype: float64

2. Series.agg()

Series.agg(func=None, axis=0, *args, **kwargs

对指定轴进行一项或多项汇总

常用参数介绍:
  • func:function, str, list or dict
    • function
    • string function name
    • list of functions and/or function names, e.g. [np.sum, 'mean']
    • dict of axis labels -> functions, function names or list of such
In [11]: s.agg('min')                                                           
Out[11]: 1.0

In [12]: s.agg(['min', 'max'])                                                  
Out[12]: 
min    1.0
max    9.0
dtype: float64

3. Series.aggregate()

Series.aggregate(func=None, axis=0, *args, **kwargs)

同Series.agg()方法一样

4. Series.transform()

Series.transform(func, axis=0, *args, **kwargs)

对每一个value都执行transform方法,结果的长度与Series长度一致

常用参数介绍:
  • func:function, str, list or dict
    • function
    • string function name
    • list of functions and/or function names, e.g. [np.sum, 'mean']
    • dict of axis labels -> functions, function names or list of such
In [13]: s.transform([np.sqrt, np.exp])                                         
Out[13]: 
       sqrt          exp
0  1.000000     2.718282
1  1.414214     7.389056
2  1.732051    20.085537
3       NaN          NaN
4  2.236068   148.413159
5       NaN          NaN
6       NaN          NaN
7       NaN          NaN
8  3.000000  8103.083928

s.transform([np.sqrt, np.exp])效果等同于s.apply([np.sqrt, np.exp]) 等同于s.agg([np.sqrt, np.exp])

5. Series.map()

Series.map(arg, na_action=None)

将Series里的值替换为map指定的值,是对值进行映射.
map接收一个dict或一个Series,如果在dict里没找到对应的映射则转为NaN,除非字典有默认值 (e.g. defaultdict)

常用参数介绍:
  • na_action:{None, ‘ignore’}, default None 【如果'ignore',则忽略NaN值,不对NaN进行映射】
In [17]: s.map({1.0: 'a'})                                                      
Out[17]: 
0      a
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
dtype: object

# 对NaN值也进行映射
In [20]: s.map('I am a {}'.format)                                              
Out[20]: 
0    I am a 1.0
1    I am a 2.0
2    I am a 3.0
3    I am a nan
4    I am a 5.0
5    I am a nan
6    I am a nan
7    I am a nan
8    I am a 9.0
dtype: object

# 不映射NaN
In [21]: s.map('I am a {}'.format, na_action='ignore')                          
Out[21]: 
0    I am a 1.0
1    I am a 2.0
2    I am a 3.0
3           NaN
4    I am a 5.0
5           NaN
6           NaN
7           NaN
8    I am a 9.0
dtype: object

6. Series.groupby()

Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)

分组

常用参数介绍:
  • by:mapping, function, label, or list of labels 【根据什么进行分组】
In [27]: s.groupby(['a', 'b', 'a', 'b', 'a', 'a', 'b', 'b', 'a']).mean()        
Out[27]: 
a    4.5
b    2.0
dtype: float64

In [28]: s.groupby(s>3).mean()                                                  
Out[28]: 
False    2.0
True     7.0
dtype: float64

In [29]: s.groupby(['a', 'b', 'a', 'b', 'a', 'a', 'b', 'b', np.nan]).mean()     
Out[29]: 
a    3.0
b    2.0
dtype: float64

In [43]: df = pd.DataFrame({'A':['a', 'a', 'b'], 'B':[1, 2, 3]})                
In [44]: df                                                                     
Out[44]: 
   A  B
0  a  1
1  a  2
2  b  3
In [45]: df.groupby('A').sum()                                                  
Out[45]: 
   B
A   
a  3
b  3

7. Series.rolling()

Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)

滚动窗口计算

常用参数介绍:
  • window:int, offset, or BaseIndexer subclass 【窗口大小】
  • min_periods:int, default None 【窗口中具有值的最小观察数否则结果为NA,对于由offset指定的窗口,min_periods将默认为1。否则,min_periods将默认为窗口的大小】
In [50]: s.rolling(2).sum()                                                     
Out[50]: 
0    NaN
1    3.0
2    5.0
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
dtype: float64

8. Series.expanding()

Series.expanding(min_periods=1, center=None, axis=0)

扩展转换。
和rolling()方法类似,只不过expanding()不设置窗口大小,窗口向后一直累加变大。

常用参数介绍:
  • min_periods:int, default None 【窗口中具有值的最小观察数否则结果为NA】
In [54]: s.expanding(min_periods=2).sum()                                       
Out[54]: 
0     NaN
1     3.0
2     6.0
3     6.0
4    11.0
5    11.0
6    11.0
7    11.0
8    20.0
dtype: float64

因为窗口中至少要有两个值,所以第一个为NaN,窗口向下拉大以此类推。

9. Series.pipe()

Series.pipe(func, *args, **kwargs)

对Series里的值应用 func(self, *args, **kwargs)

常用参数介绍:
  • func:function 【应用的方法,并将args和kwargs参数传入】
# 对每一个value加一
In [59]: s.pipe(lambda x: x+1)                                                  
Out[59]: 
0     2.0
1     3.0
2     4.0
3     NaN
4     6.0
5     NaN
6     NaN
7     NaN
8    10.0
dtype: float64

# 链式调用
In [61]: s.pipe(lambda x: x+1).pipe(lambda x: x+1)                              
Out[61]: 
0     3.0
1     4.0
2     5.0
3     NaN
4     7.0
5     NaN
6     NaN
7     NaN
8    11.0
dtype: float64

# 传递参数 加10
In [62]: s.pipe(lambda x, y: x+y, y=10)                                         
Out[62]: 
0    11.0
1    12.0
2    13.0
3     NaN
4    15.0
5     NaN
6     NaN
7     NaN
8    19.0
dtype: float64

周末也要继续 坚持 ✊ ✊ ✊!!!

相关文章

网友评论

      本文标题:Series第六讲 apply、groupby、window

      本文链接:https://www.haomeiwen.com/subject/wjgvyktx.html