using DataFrames
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])# dataframe 的构造
A | B | |
---|---|---|
Int64 | String |
4 rows × 2 columns | 1 | 1 | M | | 2 | 2 | F | | 3 | 3 | F | | 4 | 4 | M |
df[!,:B] = String["F","M","F","M"] # 和R语言一样的索引,!代表不复制
4-element Array{String,1}:
"F"
"M"
"F"
"M"
df # 更改后原表格一起变了
A | B | |
---|---|---|
Int64 | String |
4 rows × 2 columns | 1 | 1 | F | | 2 | 2 | M | | 3 | 3 | F | | 4 | 4 | M |
df[:,Symbol("B")] # Julia 使用Symbol来索引列名,这和R中的sym()很相似
4-element Array{String,1}:
"F"
"M"
"F"
"M"
names(df) # 获取列名,但不能重新付值
2-element Array{String,1}:
"A"
"B"
propertynames(df)
2-element Array{Symbol,1}:
:A
:B
size(df) ##查看纬度
(4, 2)
push!(df,[5,"M"]) # 添加行
df
A | B | |
---|---|---|
Int64 | String |
5 rows × 2 columns | 1 | 1 | F | | 2 | 2 | M | | 3 | 3 | F | | 4 | 4 | M | | 5 | 5 | M |
df[!,Not(:A)] # 反选
B | |
---|---|
String |
5 rows × 1 columns | 1 | F | | 2 | M | | 3 | F | | 4 | M | | 5 | M |
df[df.A .> 3,:] # 可以选择行,但是注意要用 .>
A | B | |
---|---|---|
Int64 | String |
2 rows × 2 columns | 1 | 4 | M | | 2 | 5 | M |
When broadcasting with in.(items, collection)
or items .∈ collection
, both item
and collection
are broadcasted over, which is often not what is intended. For example, if both arguments are vectors (and the dimensions match), the result is a vector indicating whether each value in collection items is in the value at the corresponding position in collection. To get a vector indicating whether each value in items
is in collection
, wrap collection in a tuple or a Ref
like this: in.(items, Ref(collection))
or items .∈ Ref(collection)
in.([1,2,3,4,5],[1,2,3,4,6]) ## 奇怪的语法,需要长度相等的两个向量才能计算in
5-element BitArray{1}:
1
1
1
1
0
in.([1,2,3,4],Ref([1,2])) ## 不等长需要用Ref
4-element BitArray{1}:
1
1
0
0
df[in.(df.A,Ref([1,2])),:] ## in本质还是个函数 x -> y in x
A | B | |
---|---|---|
Int64 | String |
2 rows × 2 columns | 1 | 1 | F | | 2 | 2 | M |
df = DataFrame(x1=[1, 2], x2=[3, 4], y=[5, 6])
x1 | x2 | y | |
---|---|---|---|
Int64 | Int64 | Int64 |
2 rows × 3 columns | 1 | 1 | 3 | 5 | | 2 | 2 | 4 | 6 |
select(df,Not(:x1)) ## 用select选择列
x2 | y | |
---|---|---|
Int64 | Int64 |
2 rows × 2 columns | 1 | 3 | 5 | | 2 | 4 | 6 |
select(df,:x1,:x2=>(x->x*2)=>:x2) # 添加匿名函数可以边选择边mutate =》相当于数据的传递管道符
x1 | x2 | |
---|---|---|
Int64 | Int64 |
2 rows × 2 columns | 1 | 1 | 6 | | 2 | 2 | 8 |
select(df,:x2,:x2=>ByRow(sqrt)) ## ByRow(FUN),另外的奇怪语法
x2 | x2_sqrt | |
---|---|---|
Int64 | Float64 |
2 rows × 2 columns | 1 | 3 | 1.73205 | | 2 | 4 | 2.0 |
select(df,:x2,:x2=>(x->sqrt.(x))=>Symbol("x2","_sqrt")) # 传函数的方法计算不更容易理解吗?
x2 | x2_sqrt | |
---|---|---|
Int64 | Float64 |
2 rows × 2 columns | 1 | 3 | 1.73205 | | 2 | 4 | 2.0 |
transform(df,All()=>+) # All()选择所有列
x1 | x2 | y | x1x2_y+ | |
---|---|---|---|---|
Int64 | Int64 | Int64 | Int64 |
2 rows × 4 columns | 1 | 1 | 3 | 5 | 9 | | 2 | 2 | 4 | 6 | 12 |
transform(df,AsTable(:)=>ByRow(sum)=>:sum) # 使用ByRow的语法
x1 | x2 | y | sum | |
---|---|---|---|---|
Int64 | Int64 | Int64 | Int64 |
2 rows × 4 columns | 1 | 1 | 3 | 5 | 9 | | 2 | 2 | 4 | 6 | 12 |
df = DataFrame(a = ["a", "None", "b", "None"], b = 1:4, c = ["None", "j", "k", "h"], d = ["x", "y", "None", "z"])
a | b | c | d | |
---|---|---|---|---|
String | Int64 | String | String |
4 rows × 4 columns | 1 | a | 1 | None | x | | 2 | None | 2 | j | y | | 3 | b | 3 | k | None | | 4 | None | 4 | h | z |
replace!(df.a,"None"=>"meiyou") ##列数据的替换,replacena
4-element Array{String,1}:
"a"
"meiyou"
"b"
"meiyou"
df
a | b | c | d | |
---|---|---|---|---|
String | Int64 | String | String |
4 rows × 4 columns | 1 | a | 1 | None | x | | 2 | meiyou | 2 | j | y | | 3 | b | 3 | k | None | | 4 | meiyou | 4 | h | z |
数据的连接
people = DataFrame(ID = [20, 40], Name = ["John Doe", "Jane Doe"])
ID | Name | |
---|---|---|
Int64 | String |
2 rows × 2 columns | 1 | 20 | John Doe | | 2 | 40 | Jane Doe |
jobs = DataFrame(ID = [20, 40], Job = ["Lawyer", "Doctor"])
ID | Job | |
---|---|---|
Int64 | String |
2 rows × 2 columns | 1 | 20 | Lawyer | | 2 | 40 | Doctor |
innerjoin(people,jobs,on=:ID) ## innerjoin,leftjoin,rightjoin,outerjoin
ID | Name | Job | |
---|---|---|---|
Int64 | String | String |
2 rows × 3 columns | 1 | 20 | John Doe | Lawyer | | 2 | 40 | Jane Doe | Doctor |