Python openpyxl is a third-party library for creating and manipulating Microsoft Excel files.
Excel is one of the leading spreadsheet programs, mainly employed in performing arithmetic calculations and data analysis.
Interestingly, you can read or manipulate Excel workbooks and worksheets using Python.
There are so many third-party tools that enable you to do so easily including pyslxb, xlutils and so on, however, I will be using Python openpyxl to demonstrate how to create or manipulate Excel documents in Python.
Python openpyxl
Openpyxl is an open-source library for reading or writing Excel files. This is a third-party tool and doesn’t come with standard Python installation. To use this library, you have to first install it.
pip install openpyxl
You may also want to install pillow to enable you to work with images.
pip install pillow
Working with workbooks and worksheets
First, you have to import the Workbook class from the openpyxl module.
Then, create a Workbook object from the class Workbook.
from openpyxl import Workbook wb = Workbook() #wb for workbook
In order to create a worksheet in the workbook, you would have to make use of the create_sheet method. You may wish to provide the name of the worksheet or where you want to position the worksheet in the workbook as an argument.
ws = wb.create_sheet() ws = wb.create_sheet('mysheet') #ws for worksheet
You can indicate the position of the worksheet in the workbook with 0 being the first position, 1 being the second position and so on.
ws = wb.create_sheet('mysheet', 0) ws = wb.create_sheet('mysheet', 1) ws = wb.create_sheet('mysheet', 3)
To make the worksheet the active sheet in the workbook, you use the active method.
ws = wb.active
Finally, to save the workbook into a file, you use the save method as shown below.
wb.save('hello.xlsx')
This method requires that you provide the name or the path where the file will be stored.
Accessing and manipulating values in cells
You can access or manipulate data in Excel worksheets in various ways as shown below.
Cells
ws['A1'] = 200 #Assign cell a value print(ws['A1']) #Access a given cell in a worksheet print(ws['A1'].value) #Access the value of a cell
Row and column
ws.cell(row=1, column=1, value=100)
Multiple columns
You can access or manipulate data in multiple columns as shown below:
cols = ws['A1:D4'] print(cols) #((<Cell 'mysheet'.A1>, <Cell 'mysheet'.B1>, <Cell 'mysheet'.C1>, <Cell 'mysheet'.D1>), (<Cell 'mysheet'.A2>, <Cell 'mysheet'.B2>, <Cell 'mysheet'.C2>, <Cell 'mysheet'.D2>), (<Cell 'mysheet'.A3>, <Cell 'mysheet'.B3>, <Cell 'mysheet'.C3>, <Cell 'mysheet'.D3>), (<Cell 'mysheet'.A4>, <Cell 'mysheet'.B4>, <Cell 'mysheet'.C4>, <Cell 'mysheet'.D4>))
Cell ranges
cells = ws['A1':'B2'] print(cells) #output #((<Cell 'mysheet'.A1>, <Cell 'mysheet'.B1>), (<Cell 'mysheet'.A2>, <Cell 'mysheet'.B2>))
Row ranges
cells = ws[1:5] #row 1 to row 5 print(cells) cells = ws[3:7] #row 3 to row 7 print(cells) #outputs # ((<Cell 'mysheet'.A1>,), (<Cell 'mysheet'.A2>,), (<Cell 'mysheet'.A3>,), (<Cell 'mysheet'.A4>,), (<Cell 'mysheet'.A5>,)) # ((<Cell 'mysheet'.A3>,), (<Cell 'mysheet'.A4>,), (<Cell 'mysheet'.A5>,), (<Cell 'mysheet'.A6>,), (<Cell 'mysheet'.A7>,))
Column Range
cells = ws['A:C'] #columns from A to C print(cells) #outputs # ((<Cell 'mysheet'.A1>,), (<Cell 'mysheet'.B1>,), (<Cell 'mysheet'.C1>,)) <h3>Row number</h3> cells = ws[5] #row 5 print(cells) cells = ws[11] #row 11 print(cells) #outputs # (<Cell 'mysheet'.A5>,) # (<Cell 'mysheet'.A11>,)
Column Number
cells = ws['A'] #column A print(cells) cells = ws['B'] #column B print(cells) #outputs # (<Cell 'mysheet'.A1>,) # (<Cell 'mysheet'.B1>,)
Loading an existing workbook
To access items or values in an already existing workbook, you can use the load_workbook in the openpyxl.
from openpyxl import load_workbook wb = load_workbook('hello.xlsx') ws = wb.active items = ws['a1':'b2'] #cells you want to read for row in items: for col in row: print(col) #outputs # <Cell 'Sheet'.A1> # <Cell 'Sheet'.B1> # <Cell 'Sheet'.A2> # <Cell 'Sheet'.B2>
However, if you want to see the values contained in the cells, you have to use the attribute values as shown below
for row in items: for col in row: print(col.value) #output # 20 # 20 # 20 # 20