pandas练习

作者: 小T数据站 | 来源:发表于2019-01-27 17:58 被阅读0次

https://github.com/ajcr/100-pandas-puzzles/blob/master/100-pandas-puzzles.ipynb

  • 导入pandas
  1. 将pandas以pd的名字引入

import pandas

  1. 查看导入的pandas的版本

pd.version

  1. 打印出pandas库需要的库的所有版本信息

pd.show_versions()

  • 数据库基础
    一些基础的选择、排序、添加、聚合数据的方法
import numpy as  np
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
  1. 创建索引为labels的数据框df

df = pd.DataFrame(data=data,index=labels)

  1. 展示df基础信息和它的数据

df.describe()

  1. 返回df的前三行

df.head(3)

  1. 只选择 'animal' 和'age' 列

df[['animal','age']]

  1. 选择['animal', 'age']的 [3, 4, 8]行

df.loc[df.index[[3,4,8]],['animal','age']]

  1. 选择visits大于 2的数据

df.query('visits > 2')

  1. 选择有缺失值的行, i.e. is NaN.

df[df['age'].isnull()]

  1. 选择 animal 是cat和age小于3.

df.query('animal=="cat" and age < 3') # 单=是赋值,双==才是判别

  1. 选择 age is between 2 and 4 (inclusive)的行

df.query('age>= 2 and age<=4')

  1. 将age里的'f'改成1.5.

df.loc['f','age'] = 1.5

  1. 计算visits的和 (the total number of visits).

df['visits'].sum()

  1. 计算每一个动物的平均年龄

df.groupby(by='animal')['age'].mean()
16.添加新的一行 'k' ,数据自己填,然后再将此行删除返回原数据框
df.loc['k'] = [5,'dog',2,'no']
df = df.drop('k')
df

  1. 对每个动物的数量进行计数

df['animal'].value_counts()

  1. Sort df first by the values in the 'age' in decending order, then by the value in the 'visit' column in ascending order.

df.sort_values(by=['age','visits'],ascending=[False,True])

  1. The 'priority' column contains the values 'yes' and 'no'. Replace this column with a column of boolean values: 'yes' should be True and 'no' should be False.

df['priority'] = df['priority'].map({'yes': True, 'no': False})
df

  1. In the 'animal' column, change the 'snake' entries to 'python'.

df['animal']=df['animal'].replace('snake','python')
df

  1. For each animal type and each number of visits, find the mean age. In other words, each row is an animal, each column is a number of visits and the values are the mean ages (hint: use a pivot table).

df.pivot_table(values='age',index='animal',columns='visits',aggfunc='mean')

  • 数据框:超越基础
    你可能需要结合使用两个或者更多的方法去得到正确的答案
  1. You have a DataFrame df with a column 'A' of integers. For example:
    df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7]})
    How do you filter out rows which contain the same integer as the row immediately above?

df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7]})
df.drop_duplicates(subset='A',keep='first')

  1. Given a DataFrame of numeric values, say
    df = pd.DataFrame(np.random.random(size=(5, 3))) # a 5x3 frame of float values
    how do you subtract the row mean from each element in the row?

df = pd.DataFrame(np.random.random(size=(5, 3)))
df.sub(df.mean(axis=1),axis=0)

  1. Suppose you have DataFrame with 10 columns of real numbers, for example:
    df = pd.DataFrame(np.random.random(size=(5, 10)), columns=list('abcdefghij'))
    Which column of numbers has the smallest sum? (Find that column's label.)

df = pd.DataFrame(np.random.random(size=(5, 10)), columns=list('abcdefghij'))
df.sum().idxmin()

  1. How do you count how many unique rows a DataFrame has (i.e. ignore all rows that are duplicates)?

len(df.drop_duplicates(keep=False))

  1. You have a DataFrame that consists of 10 columns of floating--point numbers. Suppose that exactly 5 entries in each row are NaN values. For each row of the DataFrame, find the column which contains the third NaN value.
    (You should return a Series of column labels.)

(df.isnill().cumsum(axis=1)==3).idxmax(axis=1)

  1. A DataFrame has a column of groups 'grps' and and column of numbers 'vals'. For example:
    df = pd.DataFrame({'grps': list('aaabbcaabcccbbc'),
    'vals': [12,345,3,1,45,14,4,52,54,23,235,21,57,3,87]})
    For each group, find the sum of the three greatest values.

df = pd.DataFrame({'grps': list('aaabbcaabcccbbc'),
'vals': [12,345,3,1,45,14,4,52,54,23,235,21,57,3,87]})
df.groupby(by='grps')['vals'].nlargest(3).sum(level=0)

  1. A DataFrame has two integer columns 'A' and 'B'. The values in 'A' are between 1 and 100 (inclusive). For each group of 10 consecutive integers in 'A' (i.e. (0, 10], (10, 20], ...), calculate the sum of the corresponding values in column 'B'.

df.groupby(pd.cut(df['A'],b ins=np.arange(0,101,10)))['B'].sum()
29~32 hard部分先pass

  • Series and DatetimeIndex
  1. Create a DatetimeIndex that contains each business day of 2015 and use it to index a Series of random numbers. Let's call this Series s.

time_index = pd.date_range('2015-01-01','2015-12-31',freq='B')
s = pd.Series(np.random.rand(len(time_index)),index=time_index)
s.head()

  1. Find the sum of the values in s for every Wednesday.

s[s.index.weekday == 2].sum()

  1. For each calendar month in s, find the mean of values.

s.resample('M').mean()

  1. For each group of four consecutive calendar months in s, find the date on which the highest value occurred.

s.groupby(pd.Grouper(freq='4M')).idxmax()

  1. Create a DateTimeIndex consisting of the third Thursday in each month for the years 2015 and 2016.

pd.date_range('2015-01-01', '2016-12-31', freq='WOM-3THU')

相关文章

  • 爬虫分析之数据存储——基于MySQL,Scrapy

    上一篇->爬虫练习之数据整理——基于Pandas上上篇->爬虫练习之数据清洗——基于Pandas 配置MySql ...

  • Pandas练习

    索引练习 索引是不能更常见的操作了。默认选取列,比如df['Date'],如果对行,用标签索引ix,如df.ix[...

  • pandas练习

    练习地址 以下是伪目录: 1、读取以及了解数据 2、筛选以及排序 3、分组 4、Apply 5、merge 6、s...

  • pandas练习

    https://github.com/ajcr/100-pandas-puzzles/blob/master/10...

  • pandas实例-总结

    继续前面的练习,之前的文章参考: pandas实例-了解你的数据-Chipotle pandas实例-筛选与排序-...

  • pandas实例-Visualization-Titanic_D

    继续前面的练习,之前的文章参考: pandas实例-了解你的数据-Chipotle pandas实例-筛选与排序-...

  • pandas实例-Visualization-Scores

    继续前面的练习,之前的文章参考: pandas实例-了解你的数据-Chipotle pandas实例-筛选与排序-...

  • pandas实例-merge-House Market

    继续前面的练习,之前的文章参考: pandas实例-了解你的数据-Chipotle pandas实例-筛选与排序-...

  • pandas实例-Stats-US_Baby_Names

    继续前面的练习,之前的文章参考: pandas实例-了解你的数据-Chipotle pandas实例-筛选与排序-...

  • pandas实例-Stats-Wind Statistics

    继续前面的练习,之前的文章参考: pandas实例-了解你的数据-Chipotle pandas实例-筛选与排序-...

网友评论

    本文标题:pandas练习

    本文链接:https://www.haomeiwen.com/subject/nccnjqtx.html