使用readxl包导入excel文件
- 主要包含两个函数
excel_sheets(),read_excel()
excel_sheet()
用于提取excel 中的表单
# Load the readxl packagelibrary(readxl)# Print the names of all worksheetsexcel_sheets("urbanpop.xlsx")
用于读取excel 表单中的信息到R
read_excel()
# The readxl package is already loaded# Read the sheets, one by onepop_1 <- read_excel("urbanpop.xlsx", sheet = 1)pop_2 <- read_excel("urbanpop.xlsx", sheet = 2)pop_3 <- read_excel("urbanpop.xlsx", sheet = 3)# Put pop_1, pop_2 and pop_3 in a list: pop_listpop_list <- list(pop_1, pop_2, pop_3)# Display the structure of pop_liststr(pop_list)
- 通过lapply 函数可以直接将提取的表单传递给read_excel() 函数。
# The readxl package is already loaded# Read all Excel sheets with lapply(): pop_listpop_list <- lapply(excel_sheets('urbanpop.xlsx'), read_excel, path = "urbanpop.xlsx")# Display the structure of pop_liststr(pop_list)
几个参数
默认参数设置为

col_types
可以通过向量进行赋值,如text, blank, numeric, date 等。
sheet
选择Excel表格中选定的表单。
skip
类似之前readr包提及的skip。用于跳过某些行内容。
col_names
默认下col_names 值为TRUE,即函数不会自动命名。可以通过赋值或改为FALSE的方式,自定义命名或依靠函数自动命名。
这里可以使用一个小技巧,通过paste() 批量连接信息。
paste(“a”, 0:10),即代表生成 “a0”, “a1”…”a10”
# The readxl package is already loaded# Import the first Excel sheet of urbanpop_nonames.xlsx (R gives names): pop_apop_a <- read_excel("urbanpop_nonames.xlsx", col_names = FALSE)# Import the first Excel sheet of urbanpop_nonames.xlsx (specify col_names): pop_bcols <- c("country", paste0("year_", 1960:1966))pop_b <- read_excel("urbanpop_nonames.xlsx", col_names = cols)# Print the summary of pop_asummary(pop_a)# Print the summary of pop_bsummary(pop_b)
另外一种导入Excel 方式:gdata包
gdata 原理:

gdata 和readxl 包对比

主要因为readxl 包还在发展,很多功能不完善,而且可能语法会变换。
因此选择gdata 这个成熟的包学习,会更加保险一些。
read.xls() 导入文件
# Import the second sheet of urbanpop.xls: urban_popurban_pop <- read.xls("urbanpop.xls", sheet = "1967-1974")
- 通过cbind() 可以添加data.frame或matrix 等信息
- data_frame[-1],可以去除第一列的信息。
na.omit可以用来除去data.frame 中的NA 信息。
例子
# Add code to import data from all three sheets in urbanpop.xlspath <- "urbanpop.xls"urban_sheet1 <- read.xls(path, sheet = 1, stringsAsFactors = FALSE)urban_sheet2 <- read.xls(path, sheet = 2, stringsAsFactors = FALSE)urban_sheet3 <- read.xls(path, sheet = 3, stringsAsFactors = FALSE)# Extend the cbind() call to include urban_sheet3: urbanurban <- cbind(urban_sheet1, urban_sheet2[-1], urban_sheet3[-1])# Remove all rows with NAs from urban: urban_cleanurban_clean <- na.omit(urban)# Print out a summary of urban_cleansummary(urban_clean)
打通excel和R的包:XLConnect
一个应用了Java的包(安装可能需要java 环境)。
几乎可以实现使用R代码进行所有excel 可以进行的操作。
loadWorkbook()
加载excel 的表格。功能是创建在R中创建一个workbook,用于连接excel文件和R工作区。可以将其赋值给一个变量。
# Load the XLConnect packagelibrary(XLConnect)# Build connection to urbanpop.xlsx: my_bookmy_book <- loadWorkbook("urbanpop.xlsx")
getsheet()
用于列出excel 文件中的所有列表
getSheets(my_book)
readWorksheet()
读取表格信息。
readWorksheet 一般有四个参数。object 为表格对象,一般为需先经过loadWorkbook() 处理;sheet 表示表格信息,startCol 表示开始的行数,endCol 表示结束的行数。
# XLConnect is already available# Build connection to urbanpop.xlsxmy_book <- loadWorkbook("urbanpop.xlsx")# Import columns 3, 4, and 5 from second sheet in my_book: urbanpop_selurbanpop_sel <- readWorksheet(my_book, sheet = 2, startCol = 3, endCol = 5)# Import first column from second sheet in my_book: countriescountries <- readWorksheet(my_book, sheet = 2, startCol = 1, endCol = 1)# cbind() urbanpop_sel and countries together: selectioncbind(urbanpop_sel, countries)selection <- cbind(countries, urbanpop_sel)
使用XLConnect 修改数据
createSheet()
createSheet(object, name = )
创建一个空的表格
# Add a worksheet to my_book, named "data_summary"createSheet(my_book, name = "data_summary")
writeWorksheet()
将新的表格信息写入到某个表格中。
writeWorksheet(object, new_object, sheet = )
# Add data in summ to "data_summary" sheetwriteWorksheet(my_book, summ, sheet = "data_summary")
saveWorkbook()
所有的编辑结束之后需要使用该函数进行文件的保存。(类似于进行excel操作后得保存文件,否则所有内容都付之东流了。)
saveWorkbook(object, flie = )
# Save workbook as summary.xlsxsaveWorkbook(my_book, file = "summary.xlsx")
renameSheet()
对表格进行重命名
renameSheet(object, 'old_name', 'new_name' )
# Rename "data_summary" sheet to "summary"renameSheet(my_book,sheet = 4, "summary")
removeSheet()
移除整个表格
`removeSheet(object, sheet = )
# Remove the fourth sheetremoveSheet(my_book, sheet = 4)
