- Numpy
- Create arrays
- my_house greater than 18.5 or smaller than 10
- Both my_house and your_house smaller than 11
- 行索引查找
- Print out observation for Japan
- Print out observations for Australia and Egypt
- 行列索引查找
- Import cars data
- Print out drives_right value of Morocco
- Print sub-DataFrame
- Import numpy as np
- Set the seed
- Generate and print random float
- Use randint() to simulate a dice
- Import Matplotlib and Seaborn
- Create a dictionary mapping subgroup values to colors
- Change the legend order in the scatter plot 散点图
- Show plot
- 3. 自定义样式
Numpy
1. 逻辑操作
- 与
np.logical_and() 或 np.logical_or()异或 np.logical_xor()非 np.logical_not()```pythonCreate arrays
import numpy as np my_house = np.array([18.0, 20.0, 10.75, 9.50]) your_house = np.array([14.0, 24.0, 14.25, 9.0])
my_house greater than 18.5 or smaller than 10
print(np.logical_or(my_house > 18.5, your_house < 10))
Both my_house and your_house smaller than 11
print(np.logical_and(my_house <11, your_house < 11))
<a name="lc5wL"></a># Pandas<a name="2bMAY"></a>## 1. import file```pythonimport pandas as pddata_df = pd.read_csv('data.csv', index_col=0) # 第一列作为索引列
2. DataFrame and Series
DataFrame,可以存储不同数据类型
Series 类似于 narray,一列中只能存在一种数据类型
# Import cars dataimport pandas as pdcars = pd.read_csv('cars.csv', index_col = 0)# Print out country column as Pandas Seriesprint(cars['country']) # output Series# Print out country column as Pandas DataFrameprint(cars[['country']]) # output DataFrame# Print out DataFrame with country and drives_right columnsprint(cars[['country','drives_right']])
3.loc and iloc
- loc 根据索引内容查找
- iloc 根据 row 和 col 的位置进行索引
```python
行索引查找
Print out observation for Japan
print(cars.loc[[‘JPN’]]) # 根据index内容索引 print(cars.iloc[[2]]) # 根据index位置索引
Print out observations for Australia and Egypt
print(cars.loc[[‘AUS’,’EG’]]) print(cars.iloc[[1,6]])
行列索引查找
Import cars data
import pandas as pd cars = pd.read_csv(‘cars.csv’, index_col = 0)
Print out drives_right value of Morocco
print(cars.loc[‘MOR’,’drives_right’])
Print sub-DataFrame
print(cars.loc[[‘RU’,’MOR’],[‘country’,’drives_right’]])
<a name="nvvXs"></a>## 4. DataFrame数据筛选```python# Import cars dataimport pandas as pdcars = pd.read_csv('cars.csv', index_col = 0)# Convert code to a one-linersel = cars[cars['drives_right']] # Series 数据类型, value为True和false# Print selprint(sel)
遍历DataFrame字典
# Import numpy as npimport numpy as np# Definition of dictionaryeurope = {'spain':'madrid', 'france':'paris', 'germany':'berlin','norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }# Iterate over europefor key, value in europe.items():print("the capital of " + key + " is " + value)# For loop over np_baseballfor i in np.nditer(np_baseball): # np.nditer 迭代列数组print(i)# Iterate over rows of carsfor lab, row in cars.iterrows(): # 遍历行数据print(lab) # 索引print(row) # 行数据
5. apply()应用
# Import cars dataimport pandas as pdcars = pd.read_csv('cars.csv', index_col = 0)# Use .apply(str.upper)cars['COUNTRY'] = cars['country'].apply(str.upper) # 直接调用函数print(cars)
6. Random
生成随机数
Set the seed
np.random.seed(123) # 设置种子参数,每次生成的随机数是一样的
Generate and print random float
np.random.rand() # 不设置参数,默认生成float
Use randint() to simulate a dice
print(np.random.randint(1,7)) # 生成整数,必须设置参数
<a name="nvkDV"></a># Seaborn底层 matplotlib,高层 seaborn```python# Import Matplotlib and Seabornimport matplotlib.pyplot as pltimport seaborn as sns# Create scatter plot with GDP on the x-axis and number of phones on the y-axissns.scatterplot(x=gdp, y=phones) # 散点图# Create count plot with region on the y-axissns.countplot(y=region) # 柱状图# Show plotplt.show()
1. sns.scatterplot()
- x轴、y轴
- hue 标签名
- hue_order 标签顺序
- palette 标签颜色
```python
Import Matplotlib and Seaborn
import matplotlib.pyplot as plt import seaborn as sns
Create a dictionary mapping subgroup values to colors
palette_colors = {“Rural”: “green”, “Urban”: “blue”}
Change the legend order in the scatter plot 散点图
sns.scatterplot(x=”absences”, y=”G3”, data=student_data, hue=”location”, hue_order=[‘Rural’,’Urban’], palette=palette_colors)
Show plot
plt.show()
<a name="PfIRx"></a>## 2. sns.relplot() 任意类型<a name="qfNXT"></a>### 散点图```python# Import Matplotlib and Seabornimport matplotlib.pyplot as pltimport seaborn as sns# 散点图# Adjust further to add subplots based on family supportsns.relplot(x="G1", y="G3",data=student_data,kind="scatter", # 散点图类型col="schoolsup", # 将列划分col_order=["yes", "no"], # 列的顺序row='famsup', # 将行划分row_order=['yes','no'])# Show plotplt.show()# Create scatter plot of horsepower vs. mpgsns.relplot(x="horsepower", y="mpg",data=mpg,kind="scatter",size="cylinders", # 点状大小区分style='origin', # 线条样式区分hue='cylinders') # 颜色区分# Show plotplt.show()
折线图
# 折线图# Add markers and make each line have the same stylesns.relplot(x="model_year", y="horsepower",data=mpg,kind="line",ci='bi', # confidence interval 置信区间style="origin",hue="origin",markers=True) # 显示标记点# Show plotplt.show()
柱状图 counterplot() and catplot()
# Create a bar plot of interest in math, separated by gendersns.catplot(x="study_time", y="G3",data=student_data,kind="bar", # 条形图order=["<2 hours","2 to 5 hours","5 to 10 hours",">10 hours"], # 显示顺序ci=None) # 取消置信区间# Show plotplt.show()
箱型图 box plot
# Create a box plot with subgroups and omit the outlierssns.catplot(x='internet',y='G3',kind='box',data=student_data,hue='location',sym='',whis=[0, 100]) # 确定离群值的上下界(IQR超过低和高四分位数的比例)# Show plotplt.show()
点状图 point plot()
区别于lineplot(),对非连续的 category 进行统计
# Remove the lines joining the pointssns.catplot(x="famrel", y="absences",data=student_data,kind="point",capsize=0.2, # 置信区间大小join=False) # 去除连接线# Show plotplt.show()
3. 自定义样式
# Set the style to "whitegrid"sns.set_style('whitegrid') # 图表背景sns.set_palette("Purples") # 柱状图颜色# sns.set_palette("RdBu")# sns.set_context("text") 最小# sns.set_context("paper") 其次sns.set_context("poster") # 设置文本大小,最大# Create a count plot of survey responsescategory_order = ["Never", "Rarely", "Sometimes","Often", "Always"]sns.catplot(x="Parents Advice",data=survey_data,kind="count",order=category_order)# titleg.fig.suptitle('Car Weight vs. Horsepower')# 线型图的标题添加# Add a title "Average MPG Over Time"g.set_title("Average MPG Over Time")# Add x-axis and y-axis labelsg.set(xlabel='Car Model Year',ylabel='Average MPG')# 旋转图形# Rotate x-tick labelsplt.xticks(rotation=90)# Show plotplt.show()
