Numpy
- 1. 逻辑操作
Create arrays
my_house greater than 18.5 or smaller than 10
Both my_house and your_house smaller than 11
- 2. DataFrame and Series
- 3.loc and iloc
行索引查找
Print out observation for Japan
Print out observations for Australia and Egypt
行列索引查找
Import cars data
Print out drives_right value of Morocco
Print sub-DataFrame
- 5. apply()应用
- 6. Random
Import numpy as np
Set the seed
Generate and print random float
Use randint() to simulate a dice
- 1. sns.scatterplot()
Import Matplotlib and Seaborn
Create a dictionary mapping subgroup values to colors
Change the legend order in the scatter plot 散点图
Show plot
- 折线图
- 柱状图 counterplot() and catplot()
- 箱型图 box plot
- 点状图 point plot()
3. 自定义样式

Numpy

1. 逻辑操作

与np.logical_and()
或 np.logical_or()
异或 np.logical_xor()
非 np.logical_not() ```python
Create arrays
import numpy as np my_house = np.array([18.0, 20.0, 10.75, 9.50]) your_house = np.array([14.0, 24.0, 14.25, 9.0])

my_house greater than 18.5 or smaller than 10

print(np.logical_or(my_house > 18.5, your_house < 10))

Both my_house and your_house smaller than 11

print(np.logical_and(my_house <11, your_house < 11))

<a name="lc5wL"></a>
# Pandas
<a name="2bMAY"></a>
## 1. import file
```python
import pandas as pd
data_df = pd.read_csv('data.csv', index_col=0) # 第一列作为索引列

2. DataFrame and Series

DataFrame，可以存储不同数据类型
Series 类似于 narray，一列中只能存在一种数据类型

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out country column as Pandas Series
print(cars['country']) # output Series
# Print out country column as Pandas DataFrame
print(cars[['country']]) # output DataFrame
# Print out DataFrame with country and drives_right columns
print(cars[['country','drives_right']])

3.loc and iloc

loc 根据索引内容查找
iloc 根据 row 和 col 的位置进行索引 ```python
行索引查找
Print out observation for Japan
print(cars.loc[[‘JPN’]]) # 根据index内容索引 print(cars.iloc[[2]]) # 根据index位置索引

Print out observations for Australia and Egypt

print(cars.loc[[‘AUS’,’EG’]]) print(cars.iloc[[1,6]])

行列索引查找

Import cars data

import pandas as pd cars = pd.read_csv(‘cars.csv’, index_col = 0)

Print out drives_right value of Morocco

print(cars.loc[‘MOR’,’drives_right’])

Print sub-DataFrame

print(cars.loc[[‘RU’,’MOR’],[‘country’,’drives_right’]])

<a name="nvvXs"></a>
## 4. DataFrame数据筛选
```python
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Convert code to a one-liner
sel = cars[cars['drives_right']] # Series 数据类型, value为True和false
# Print sel
print(sel)

遍历DataFrame字典

# Import numpy as np
import numpy as np
# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
          'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
# Iterate over europe
for key, value in  europe.items():
    print("the capital of " + key + " is " + value)
# For loop over np_baseball
for i in  np.nditer(np_baseball): # np.nditer 迭代列数组
    print(i)
# Iterate over rows of cars
for lab, row in  cars.iterrows(): # 遍历行数据
    print(lab) # 索引
    print(row) # 行数据

5. apply()应用

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Use .apply(str.upper)
cars['COUNTRY'] = cars['country'].apply(str.upper) # 直接调用函数
print(cars)

6. Random

生成随机数

np.random.seed()
np.random.rand()
np.random.randint() ```python
Import numpy as np
import numpy as np

Set the seed

np.random.seed(123) # 设置种子参数，每次生成的随机数是一样的

Generate and print random float

np.random.rand() # 不设置参数，默认生成float

Use randint() to simulate a dice

print(np.random.randint(1,7)) # 生成整数,必须设置参数

<a name="nvkDV"></a>
# Seaborn
底层 matplotlib，高层 seaborn
```python
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns
# Create scatter plot with GDP on the x-axis and number of phones on the y-axis
sns.scatterplot(x=gdp, y=phones) # 散点图
# Create count plot with region on the y-axis
sns.countplot(y=region) # 柱状图
# Show plot
plt.show()

1. sns.scatterplot()

x轴、y轴
hue 标签名
hue_order 标签顺序
palette 标签颜色 ```python
Import Matplotlib and Seaborn
import matplotlib.pyplot as plt import seaborn as sns

Create a dictionary mapping subgroup values to colors

palette_colors = {“Rural”: “green”, “Urban”: “blue”}

Change the legend order in the scatter plot 散点图

sns.scatterplot(x=”absences”, y=”G3”, data=student_data, hue=”location”, hue_order=[‘Rural’,’Urban’], palette=palette_colors)

Show plot

plt.show()

<a name="PfIRx"></a>
## 2. sns.relplot() 任意类型
<a name="qfNXT"></a>
### 散点图
```python
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns
# 散点图
# Adjust further to add subplots based on family support
sns.relplot(x="G1", y="G3", 
            data=student_data,
            kind="scatter",  # 散点图类型
            col="schoolsup", # 将列划分
            col_order=["yes", "no"], # 列的顺序
            row='famsup',    # 将行划分
            row_order=['yes','no'])
# Show plot
plt.show()
# Create scatter plot of horsepower vs. mpg
sns.relplot(x="horsepower", y="mpg", 
            data=mpg, 
            kind="scatter", 
            size="cylinders", # 点状大小区分 
            style='origin',   # 线条样式区分
            hue='cylinders')  # 颜色区分
# Show plot
plt.show()

折线图

# 折线图
# Add markers and make each line have the same style
sns.relplot(x="model_year", y="horsepower", 
            data=mpg, 
            kind="line", 
            ci='bi', # confidence interval 置信区间
            style="origin", 
            hue="origin",
            markers=True) # 显示标记点
# Show plot
plt.show()

柱状图 counterplot() and catplot()

# Create a bar plot of interest in math, separated by gender
sns.catplot(x="study_time", y="G3",
            data=student_data,
            kind="bar",     # 条形图
            order=["<2 hours", 
                   "2 to 5 hours", 
                   "5 to 10 hours", 
                   ">10 hours"], # 显示顺序
                   ci=None) # 取消置信区间
# Show plot
plt.show()

箱型图 box plot

# Create a box plot with subgroups and omit the outliers
sns.catplot(x='internet',y='G3',
            kind='box',
            data=student_data,
            hue='location',
            sym='',
               whis=[0, 100]) # 确定离群值的上下界（IQR超过低和高四分位数的比例）
# Show plot
plt.show()

点状图 point plot()

区别于lineplot()，对非连续的 category 进行统计

# Remove the lines joining the points
sns.catplot(x="famrel", y="absences",
            data=student_data,
            kind="point",
            capsize=0.2, # 置信区间大小
            join=False) # 去除连接线
# Show plot
plt.show()

3. 自定义样式

# Set the style to "whitegrid"
sns.set_style('whitegrid') # 图表背景
sns.set_palette("Purples") # 柱状图颜色
# sns.set_palette("RdBu")
# sns.set_context("text") 最小
# sns.set_context("paper") 其次
sns.set_context("poster") # 设置文本大小，最大
# Create a count plot of survey responses
category_order = ["Never", "Rarely", "Sometimes", 
                  "Often", "Always"]
sns.catplot(x="Parents Advice", 
            data=survey_data, 
            kind="count", 
            order=category_order)
# title
g.fig.suptitle('Car Weight vs. Horsepower')
# 线型图的标题添加
# Add a title "Average MPG Over Time"
g.set_title("Average MPG Over Time")
# Add x-axis and y-axis labels
g.set(xlabel='Car Model Year',ylabel='Average MPG')
# 旋转图形
# Rotate x-tick labels
plt.xticks(rotation=90)
# Show plot
plt.show()

机器学习

Numpy 、Pandas 、Matplotlib and Seaborn