该部分有42个函数,数学统计相关的函数,如果需要用到统计类的功能,优先考虑这里的功能函数,且有一些统计函数我没有理解数学相关的概念,须仔细。
abs(1):将Series值的绝对值返回
lists = [1,-2,3,-4,5]dp = pd.Series(lists)print(dp.abs())
all(2),any(3):感觉意义不是很大,暂不做记录
lists = [1,2,3,4,5]dp = pd.Series(lists)print(dp.all())print(pd.Series([np.nan]).all(skipna=False))print(pd.Series([]).all())print(dp.any())print(pd.Series([np.nan]).any(skipna=False))print(pd.Series([]).any())
autocorr(4):Pearson correlation的相关计算,暂时弄不明白,留待后面进行学习使用
lists = [1,2,3,4,5]dp = pd.Series(lists)print(dp.autocorr())
between(5):类似oracle里面的between,可通过inclusive来设置是否包含边界
lists = [1,2,3,4,5]dp = pd.Series(lists)print(dp.between(1,10))print(dp.between(1,4,inclusive=False))
clip(6):设定上下界,对超出的部分,按上下界同化。
lists = [1,2,3,4,5]dp = pd.Series(lists)print(dp.clip())print(dp.clip(2))print(dp.clip(2,4))print('-------------------------')print(dp.clip(2,4,inplace=True))
corr(7):通过内部指定的,或自定义的函数来对两个Series进行计算。
lists = [1,2,3,4,5]dp = pd.Series(lists)dp1 = pd.Series([10,11,12,13,14])print(dp.corr(dp1))print(dp.corr(dp1,method='kendall'))print(dp.corr(dp1,method='spearman'))def computer0(arg0,arg1):return arg0+arg1print(dp.corr(dp1,method=computer0))
count(8):统计非空或nan的数据数
lists = [1,2,3,4,5,np.nan]dp = pd.Series(lists)print(dp.count())
cov(9):计算两个Series的协方差,这个没明白,得好好看看数学
lists = [1,2,3,4,5]dp = pd.Series(lists)dp1 = pd.Series([10,11,12,13,14])print(dp.cov(dp1))
cummax(10),cummin(11),cumprod(12),cumsum(13):1)如果最大值不在最后,则最大值之后的数值,在输出时全部变为最大值; 2)和最大值相似,这里是最小值。 3)阶乘。 4)连续加
lists = [1,2,3,4,5,]dp = pd.Series(lists)print(dp.cummax())print(dp.cummin())print(dp.cumprod())print(dp.cumsum())lists = [1,2,6,4,5,np.nan]dp = pd.Series(lists)print(dp.cummax(skipna=False))print(dp.cummin(skipna=False))print(dp.cumprod(skipna=False))print(dp.cumsum(skipna=False))
describe(14):生成状态描述。
lists = [1,2,3,4,5]dp = pd.Series(lists)print(dp.describe())#百分位的控制print(dp.describe(percentiles=[.70]))print(dp.describe(include=['category']))print(dp.describe(exclude=[np.number]))#遇到非数值会忽略一些状态显示print(pd.Series(['a','b','c']).describe(percentiles=[.70],exclude=[np.number]))
diff(15):Series内后面的值减前面的值,如果periods=1则,后一位减前一位,其它依次类推。
lists = [1,2,3,4,5,10,np.nan,100]dp = pd.Series(lists)print(dp.diff())print(dp.diff(periods=2))print(dp.diff(periods=-1))
factorize(16):编码转换对象为枚举类型或绝对变量
lists = [1,2,3,4,5,10,np.nan,100]dp = pd.Series(lists)print(type(dp.factorize()))
kurt(17),mad(18),max(19),mean(20),median(21),min(22),mode(23):1)峰度计算,2)绝对偏差中值,3)均值,4)中位数,5)最小值,6)返回一个排序后的Series模式。
lists = [1,2,3,4]dp = pd.Series(lists)print(dp.kurt())print(dp.mad())print(dp.max())print(dp.mean())print(dp.median())print(dp.min())print(dp.mode())
nlargest(24),nsmallest(25):选取出最大(小)值,对应的keep会设定一些方式,但最终的方式上我不清楚到底依据了什么。后面有需要在研究。
lists = [1,2,3,4,5,5,6,6,7,7,7,8,10]dp = pd.Series(lists)print(dp.nlargest())print(dp.nlargest(n=1))print(dp.nlargest(keep='last'))print(dp.nlargest(n=3,keep='all'))print(dp.nsmallest())print(dp.nsmallest(n=1))print(dp.nsmallest(keep='last'))print(dp.nsmallest(n=3,keep='all'))
pct_change(26):数组值排序为a1,a2,a3等,那么计算的公式为,当periods=1时,(a2-a1)/a1,其它依次类推。
lists = [3,2]dp = pd.Series(lists)print(dp.pct_change(periods=1))print(dp.pct_change(periods=2))
prod(27):计算所有数值之积
lists = [3,3,4]dp = pd.Series(lists)print(dp.pct_change(periods=1))print(dp.pct_change(periods=2))
quantile(28):假设有n个值,相邻的两个值之间的差值算一个区块,则有n-1块。设每一块分别为i1,i2…..,通过设置q的占比来获取每一个i中的数值.
lists = [1,2,3,4,5]dp = pd.Series(lists)#这里可以分为4块,0.1能够以获取1,2间的区块值。print(dp.quantile(q=0.1))
rank(29):返回数值大小顺序数.
lists = [1,2,3,10,6]dp = pd.Series(lists)print(dp.rank())
sem(30):标准均值误差?不理解这个算法规则,暂时放置
lists = [1,2,3,10,6]dp = pd.Series(lists)print(dp.sem())
skew(31),std(32):1)偏度?,2)样本标准差?不理解这个算法规则,暂时放置
lists = [1,2,3,10,6]dp = pd.Series(lists)print(dp.skew())print(dp.std())
sum(33):数值和。
lists = [1,2]dp = pd.Series(lists)print(dp.sum())
var(34):无偏方差?不理解这个算法规则,暂时放置
lists = [1,2,np.nan]dp = pd.Series(lists)print(dp.var())
kurtosis(35):峰度计算?不理解这个算法规则,暂时放置
lists = [1,2]dp = pd.Series(lists)print(dp.kurtosis())
unique(36),nunique(37),is_unique(38):1)返回唯一的数值,2)返回唯一元素数的总数值,3)判断Seires是否是唯一。
lists = [1,2,2,3,np.nan]dp = pd.Series(lists)print(dp.unique())print(dp.nunique())print(dp.is_unique)
is_monotonic(39),is_monotonic_increasing(40),is_monotonic_decreasing(41):1)判断是否为递增,2)和1一样,3)判断是否为递减。
lists = [1,2,2,3]dp = pd.Series(lists)print(dp.is_monotonic)print(dp.is_monotonic_increasing)print(dp.is_monotonic_decreasing)
value_counts(42):返回相同值的统计,按相同值数量的倒序排列。
lists = [1,2,2,3,np.nan]dp = pd.Series(lists)print(dp.value_counts())
