How [Translation] Ruby to access Excel files

Parsing Excel Files with Ruby

 BY: MATT NEDRICH    translation: Helper 7001   

    In this article, I will be judged in several languages ​​Ruby library access Excel files. I will discuss several existing Ruby libraries for access to Excel files in different formats. This article focuses more on reading Excel files, but also with the changes / write Excel files a little more discussion.

If you can not wait to see the code, please move on my Github submit a project  , the project has some code snippets to read Excel files, are mentioned in this article. 

Excel file types

Before we get into the different Ruby libraries, let’s talk about Excel files. It is important to identify the type of Excel files that you are going to be using. There are two main types: legacy files and the newer OOXML file format introduced in Microsoft Office 2007.

There is a nice description of the differences on Wikipedia. The tldr; version is that the legacy file format includes files with the following extensions:

File name extension Explanation
.xls Traditional Excel file format
.xlt Excel template traditional format
.xlm Excel file format with the traditional macro code

Microsoft Excel 2007 abandoned the legacy binary format and switched to the Open Office XML (OOXML) format that is used today. These files use the following extensions:

File name extension Explanation
.xlsx OOXML Excel file
.xlst OOXML Excel file template
.xlsm OOXML Excel file with macros

Determine what Excel file format (the traditional format or OOOXML format) is very important that you will be involved. If you use Excel software work could often turn to go between various formats, but in my scenario, the Excel file is received from an external file format and can not control, but I do not want to rely on manual format conversion. And there is no need, modern .xlsx format generally can use other software to access a spreadsheet, for example: a Numbers  and LibreOffice .  

Excel library in Ruby

There are many Ruby library for accessing Excel-- may too much. When I studied these different libraries, really spent a lot of time to figure out their functions and limitations. I found the following questions useful for a library for research:

  1. What Excel file format support?
  2. Support read or write, or that reading and writing are supported?
  3. I can support huge files? Quickly?
  4. Do I have to read the file? You can support streaming mode?

Depending on the application, these problems several or all may be very important.

Select the appropriate library

The following table details the six different features Ruby Excel to access the library:

Storehouse license 支持.xlsx 支持.xls ability
axlsx WITH yes no write
rubyXL WITH yes no read/write
roo WITH yes yes read
creek WITH yes no read
spreadsheet GPLv3 no yes read/write
simple_xlsx_reader WITH yes no read

Based on your needs, of which one or more libraries may be able to help. Consider the following usage scenarios:

Write .xlsx file

If you need to write axslx is a good choice . It supports write cell value generated charts. If you need a lightweight library, rubyXL  is a good option. .xlsx文件,

Read .xlsx file

If you just need to read  the file, you can rubyXL, Roo, Creek and choose among a simple_xlsx_reader. roo is a very popular choice, because it also supports traditional creek and simple_xlsx_reader clearly more adept at handling large files. If you want from  reading the data stream (rather than file), rubyXL  became the only choice. .xlsx.xls格式。然而,如果你关注速度,IO

.Xlsx file read and write

If you need to read and write .xlsx files, you have two options. You can use rubyXL, it supports reading and writing. Another option is that you can use two different libraries, one for reading, one for writing . 

Read and write Excel files tradition

To support the traditional .xls format will have more constraints. If you only need to support traditional spreadsheet, it supports reading and writing. If you also need support you can choose Roo  , both support reading traditional format also supports modern formats. .xls,我推荐.xlsx格式,我更推荐选择第二个gem来做此事......除非你仅仅需要读取功能,这样的话 

The good news is, whether you ultimately choose the kind of library, open the file and read code is very simple, and use different library looks very similar. For example, here is the use of creek code.

require 'creek'

workbook = Creek::Book.new 'path/to/file.xlsx'
worksheets = workbook.sheets

worksheets.each do |worksheet|
  worksheet.rows.each do |row|
    row_cells = row.values
    # do something with row_cells
  end
end


I submit to the project on GitHub, there are sample code uses the library to read .xlsx of each.

performance

If you need to read a huge amount of data in Excel files, you might compare the performance of the respective library. I quickly established a somewhat dirty code performance testing program, tested in the above table four kinds can be read . .xlsx格式的库

I created a sample .xlsx文件,分别含有 500,10000,50000,200000 and 500,000 rows of data. Then I run the code to read each file (ie read every row in the data file). Various libraries Each sample file to read code may be re here  obtained.

I read each individual library files are run three times, recording the average time (per pass time change is not great).

rubyXL 和 roo性能大体相当, 读取500000行的Excel文件需要2分多钟。 creek 和simple_xlsx_reader 则都快的多了,只需要不足一分钟就能读取 500000行的Excel文件。

我希望本文能为你使用Ruby语言访问Excel文件提供些许地指引。如果你正在使用一种我没有提到的库,并且你很喜欢它,请务必告知我。

Guess you like

Origin www.cnblogs.com/dajianshi/p/11613060.html