1. using DataFrames
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])# dataframe 的构造
A B
Int64 String

4 rows × 2 columns | 1 | 1 | M | | 2 | 2 | F | | 3 | 3 | F | | 4 | 4 | M |

df[!,:B] = String["F","M","F","M"] # 和R语言一样的索引,!代表不复制
4-element Array{String,1}:
 "F"
 "M"
 "F"
 "M"
df # 更改后原表格一起变了
A B
Int64 String

4 rows × 2 columns | 1 | 1 | F | | 2 | 2 | M | | 3 | 3 | F | | 4 | 4 | M |

df[:,Symbol("B")] # Julia 使用Symbol来索引列名,这和R中的sym()很相似
4-element Array{String,1}:
 "F"
 "M"
 "F"
 "M"
names(df) # 获取列名,但不能重新付值
2-element Array{String,1}:
 "A"
 "B"
propertynames(df)
2-element Array{Symbol,1}:
 :A
 :B
size(df) ##查看纬度
(4, 2)
push!(df,[5,"M"]) # 添加行
df
A B
Int64 String

5 rows × 2 columns | 1 | 1 | F | | 2 | 2 | M | | 3 | 3 | F | | 4 | 4 | M | | 5 | 5 | M |

df[!,Not(:A)] # 反选
B
String

5 rows × 1 columns | 1 | F | | 2 | M | | 3 | F | | 4 | M | | 5 | M |

df[df.A .> 3,:] # 可以选择行,但是注意要用 .>
A B
Int64 String

2 rows × 2 columns | 1 | 4 | M | | 2 | 5 | M |

When broadcasting with in.(items, collection) or items .∈ collection, both item and collection are broadcasted over, which is often not what is intended. For example, if both arguments are vectors (and the dimensions match), the result is a vector indicating whether each value in collection items is in the value at the corresponding position in collection. To get a vector indicating whether each value in items is in collection, wrap collection in a tuple or a Ref like this: in.(items, Ref(collection)) or items .∈ Ref(collection)

in.([1,2,3,4,5],[1,2,3,4,6]) ## 奇怪的语法,需要长度相等的两个向量才能计算in
5-element BitArray{1}:
 1
 1
 1
 1
 0
in.([1,2,3,4],Ref([1,2])) ## 不等长需要用Ref
4-element BitArray{1}:
 1
 1
 0
 0
df[in.(df.A,Ref([1,2])),:] ## in本质还是个函数 x -> y in x
A B
Int64 String

2 rows × 2 columns | 1 | 1 | F | | 2 | 2 | M |

df = DataFrame(x1=[1, 2], x2=[3, 4], y=[5, 6])
x1 x2 y
Int64 Int64 Int64

2 rows × 3 columns | 1 | 1 | 3 | 5 | | 2 | 2 | 4 | 6 |

select(df,Not(:x1)) ## 用select选择列
x2 y
Int64 Int64

2 rows × 2 columns | 1 | 3 | 5 | | 2 | 4 | 6 |

select(df,:x1,:x2=>(x->x*2)=>:x2) # 添加匿名函数可以边选择边mutate =》相当于数据的传递管道符
x1 x2
Int64 Int64

2 rows × 2 columns | 1 | 1 | 6 | | 2 | 2 | 8 |

select(df,:x2,:x2=>ByRow(sqrt)) ## ByRow(FUN),另外的奇怪语法
x2 x2_sqrt
Int64 Float64

2 rows × 2 columns | 1 | 3 | 1.73205 | | 2 | 4 | 2.0 |

select(df,:x2,:x2=>(x->sqrt.(x))=>Symbol("x2","_sqrt")) # 传函数的方法计算不更容易理解吗?
x2 x2_sqrt
Int64 Float64

2 rows × 2 columns | 1 | 3 | 1.73205 | | 2 | 4 | 2.0 |

transform(df,All()=>+) # All()选择所有列
x1 x2 y x1x2_y+
Int64 Int64 Int64 Int64

2 rows × 4 columns | 1 | 1 | 3 | 5 | 9 | | 2 | 2 | 4 | 6 | 12 |

transform(df,AsTable(:)=>ByRow(sum)=>:sum) # 使用ByRow的语法
x1 x2 y sum
Int64 Int64 Int64 Int64

2 rows × 4 columns | 1 | 1 | 3 | 5 | 9 | | 2 | 2 | 4 | 6 | 12 |

df = DataFrame(a = ["a", "None", "b", "None"], b = 1:4, c = ["None", "j", "k", "h"], d = ["x", "y", "None", "z"])
a b c d
String Int64 String String

4 rows × 4 columns | 1 | a | 1 | None | x | | 2 | None | 2 | j | y | | 3 | b | 3 | k | None | | 4 | None | 4 | h | z |

replace!(df.a,"None"=>"meiyou") ##列数据的替换,replacena
4-element Array{String,1}:
 "a"
 "meiyou"
 "b"
 "meiyou"
df
a b c d
String Int64 String String

4 rows × 4 columns | 1 | a | 1 | None | x | | 2 | meiyou | 2 | j | y | | 3 | b | 3 | k | None | | 4 | meiyou | 4 | h | z |

数据的连接

people = DataFrame(ID = [20, 40], Name = ["John Doe", "Jane Doe"])
ID Name
Int64 String

2 rows × 2 columns | 1 | 20 | John Doe | | 2 | 40 | Jane Doe |

jobs = DataFrame(ID = [20, 40], Job = ["Lawyer", "Doctor"])
ID Job
Int64 String

2 rows × 2 columns | 1 | 20 | Lawyer | | 2 | 40 | Doctor |

innerjoin(people,jobs,on=:ID) ## innerjoin,leftjoin,rightjoin,outerjoin
ID Name Job
Int64 String String

2 rows × 3 columns | 1 | 20 | John Doe | Lawyer | | 2 | 40 | Jane Doe | Doctor |