Java
在使用POI进行excel操作时,当数据量较大时经常会产生内存溢出异常。下面通过分析如何解决该问题

一、POI结构图

POI大量数据读取内存溢出分析及解决方案 - 图1

二、内存溢出问题

在项目中遇到二十万行数据要写入到excel中时会内存溢出,一般方法是调大tomcat的内存,但是调到2048M还是会内存溢出报错。因此分析其原因。
通过分析其源码,得出其实现步骤为通过InputStream一行行读取到TreeMap类型的HSSFRow结构体中,因此当数据量大时就会造成内存溢出。

  1. public HSSFWorkbook(DirectoryNode directory, boolean preserveNodes)
  2. throws IOException
  3. {
  4. super(directory);
  5. String workbookName = getWorkbookDirEntryName(directory);
  6. this.preserveNodes = preserveNodes;
  7. // If we're not preserving nodes, don't track the
  8. // POIFS any more
  9. if(! preserveNodes) {
  10. clearDirectory();
  11. }
  12. _sheets = new ArrayList<HSSFSheet>(INITIAL_CAPACITY);
  13. names = new ArrayList<HSSFName>(INITIAL_CAPACITY);
  14. // Grab the data from the workbook stream, however
  15. // it happens to be spelled.
  16. InputStream stream = directory.createDocumentInputStream(workbookName);
  17. List<Record> records = RecordFactory.createRecords(stream);
  18. workbook = InternalWorkbook.createWorkbook(records);
  19. setPropertiesFromWorkbook(workbook);
  20. int recOffset = workbook.getNumRecords();
  21. // convert all LabelRecord records to LabelSSTRecord
  22. convertLabelRecords(records, recOffset);
  23. RecordStream rs = new RecordStream(records, recOffset);
  24. while (rs.hasNext()) {
  25. try {
  26. InternalSheet sheet = InternalSheet.createSheet(rs);
  27. _sheets.add(new HSSFSheet(this, sheet));
  28. } catch (UnsupportedBOFType eb) {
  29. // Hopefully there's a supported one after this!
  30. log.log(POILogger.WARN, "Unsupported BOF found of type " + eb.getType());
  31. }
  32. }
  33. for (int i = 0 ; i < workbook.getNumNames() ; ++i){
  34. NameRecord nameRecord = workbook.getNameRecord(i);
  35. HSSFName name = new HSSFName(this, nameRecord, workbook.getNameCommentRecord(nameRecord));
  36. names.add(name);
  37. }
  38. }
  39. /**
  40. * add a row to the sheet
  41. *
  42. * @param addLow whether to add the row to the low level model - false if its already there
  43. */
  44. private void addRow(HSSFRow row, boolean addLow) {
  45. _rows.put(Integer.valueOf(row.getRowNum()), row);
  46. if (addLow) {
  47. _sheet.addRow(row.getRowRecord());
  48. }
  49. boolean firstRow = _rows.size() == 1;
  50. if (row.getRowNum() > getLastRowNum() || firstRow) {
  51. _lastrow = row.getRowNum();
  52. }
  53. if (row.getRowNum() < getFirstRowNum() || firstRow) {
  54. _firstrow = row.getRowNum();
  55. }
  56. }

excel数据行读取到内存的存储结构如下:
POI大量数据读取内存溢出分析及解决方案 - 图2

三、解决方案

poi官网给了一种大批量数据写入的方法,使用SXXFWorkbook类进行大批量写入操作解决了这个问题,可以监控该样例,会发现整体内存呈现锯齿状,能够及时回收,内存相对比较平稳。

  1. package org.bird.poi;
  2. import java.io.File;
  3. import java.io.FileOutputStream;
  4. import java.io.IOException;
  5. import java.net.URL;
  6. import org.apache.poi.ss.usermodel.Cell;
  7. import org.apache.poi.ss.usermodel.Row;
  8. import org.apache.poi.ss.usermodel.Sheet;
  9. import org.apache.poi.ss.util.CellReference;
  10. import org.apache.poi.xssf.streaming.SXSSFWorkbook;
  11. import org.junit.Assert;
  12. public class XSSFWriter {
  13. private static SXSSFWorkbook wb;
  14. public static void main(String[] args) throws IOException {
  15. wb = new SXSSFWorkbook(10000);
  16. Sheet sh = wb.createSheet();
  17. for(int rownum = 0; rownum < 100000; rownum++){
  18. Row row = sh.createRow(rownum);
  19. for(int cellnum = 0; cellnum < 10; cellnum++){
  20. Cell cell = row.createCell(cellnum);
  21. String address = new CellReference(cell).formatAsString();
  22. cell.setCellValue(address);
  23. }
  24. }
  25. // Rows with rownum < 900 are flushed and not accessible
  26. for(int rownum = 0; rownum < 90000; rownum++){
  27. Assert.assertNull(sh.getRow(rownum));
  28. }
  29. // ther last 100 rows are still in memory
  30. for(int rownum = 90000; rownum < 100000; rownum++){
  31. Assert.assertNotNull(sh.getRow(rownum));
  32. }
  33. URL url = XSSFWriter.class.getClassLoader().getResource("");
  34. FileOutputStream out = new FileOutputStream(url.getPath() + File.separator + "wirter.xlsx");
  35. wb.write(out);
  36. out.close();
  37. // dispose of temporary files backing this workbook on disk
  38. wb.dispose();
  39. }
  40. }

POI大量数据读取内存溢出分析及解决方案 - 图3