Python高效读取Excel (.xlsx) 文件的多种方法及性能比较15

在数据分析和处理中，经常需要读取Excel文件（特别是`.xlsx`格式）中的数据。Python提供了多种库来实现这一功能，每种库都有其自身的优势和劣势。本文将详细介绍几种常用的Python库，并比较它们的性能，帮助你选择最适合你需求的方法。

1. 使用`openpyxl`库:

`openpyxl`是一个纯Python库，用于读取和写入Excel 2010 xlsx/xlsm/xltx/xltm文件。它是一个功能强大的库，支持读取和写入各种Excel特性，例如单元格样式、公式、图表等。但是，对于仅需要读取数据的情况，`openpyxl`的性能可能不如其他一些库。

以下是使用`openpyxl`读取`.xlsx`文件数据的示例代码：```python
from openpyxl import load_workbook
def read_xlsx_openpyxl(filepath):
"""
使用openpyxl读取xlsx文件数据。
Args:
filepath: xlsx文件的路径。
Returns:
一个包含所有sheet数据的字典，键是sheet名称，值是包含数据的列表。
返回None，如果文件不存在或打开失败。
"""
try:
workbook = load_workbook(filepath, data_only=True) # data_only=True 读取计算后的值，而不是公式
data = {}
for sheet_name in :
sheet = workbook[sheet_name]
rows = []
for row in sheet.iter_rows():
row_data = [ for cell in row]
(row_data)
data[sheet_name] = rows
return data
except FileNotFoundError:
print(f"Error: File not found at {filepath}")
return None
except Exception as e:
print(f"Error reading file: {e}")
return None
# 示例用法
filepath = "" # 请替换为你的文件路径
data = read_xlsx_openpyxl(filepath)
if data:
for sheet_name, sheet_data in ():
print(f"Sheet: {sheet_name}")
for row in sheet_data:
print(row)
```

2. 使用`pandas`库:

`pandas`是Python数据分析的利器，它提供了强大的数据结构`DataFrame`，可以直接读取`.xlsx`文件并将其转换为`DataFrame`对象。`pandas`利用了`openpyxl`或`xlrd` (对于`.xls`文件) 等底层库，并对其进行了优化，使其读取速度更快，并且提供更方便的数据处理功能。

以下是使用`pandas`读取`.xlsx`文件数据的示例代码：```python
import pandas as pd
def read_xlsx_pandas(filepath):
"""
使用pandas读取xlsx文件数据。
Args:
filepath: xlsx文件的路径。
Returns:
一个pandas DataFrame对象，或者None如果文件不存在或打开失败。
"""
try:
df = pd.read_excel(filepath, engine='openpyxl') # 指定engine为openpyxl以支持xlsx
return df
except FileNotFoundError:
print(f"Error: File not found at {filepath}")
return None
except Exception as e:
print(f"Error reading file: {e}")
return None

# 示例用法
filepath = "" # 请替换为你的文件路径
df = read_xlsx_pandas(filepath)
if df is not None:
print(df)
```