Working with Excel Files in Python openpyxl

Python openpyxl is a third-party library for creating and manipulating Microsoft Excel files.

Excel is one of the leading spreadsheet programs, mainly employed in performing arithmetic calculations and data analysis.

Interestingly, you can read or manipulate Excel workbooks and worksheets using Python.

There are so many third-party tools that enable you to do so easily including pyslxb, xlutils and so on, however, I will be using Python openpyxl to demonstrate how to create or manipulate Excel documents in Python.

Python openpyxl

Openpyxl is an open-source library for reading or writing Excel files. This is a third-party tool and doesn’t come with standard Python installation. To use this library, you have to first install it.

pip install openpyxl

You may also want to install pillow to enable you to work with images.

pip install pillow

Working with workbooks and worksheets

First, you have to import the Workbook class from the openpyxl module.

Then, create a Workbook object from the class Workbook.


from openpyxl import Workbook

wb = Workbook()

#wb for workbook

In order to create a worksheet in the workbook, you would have to make use of the create_sheet method. You may wish to provide the name of the worksheet or where you want to position the worksheet in the workbook as an argument.


ws = wb.create_sheet()

ws = wb.create_sheet('mysheet')

#ws for worksheet

You can indicate the position of the worksheet in the workbook with 0 being the first position, 1 being the second position and so on.


ws = wb.create_sheet('mysheet', 0)

ws = wb.create_sheet('mysheet', 1)

ws = wb.create_sheet('mysheet', 3)

To make the worksheet the active sheet in the workbook, you use the active method.


ws = wb.active

Finally, to save the workbook into a file, you use the save method as shown below.


wb.save('hello.xlsx')

This method requires that you provide the name or the path where the file will be stored.

Accessing and manipulating values in cells

You can access or manipulate data in Excel worksheets in various ways as shown below.

Cells


ws['A1'] = 200

#Assign cell a value

print(ws['A1'])

#Access a given cell in a worksheet

print(ws['A1'].value)

#Access the value of a cell

Row and column

ws.cell(row=1, column=1, value=100)

Multiple columns

You can access or manipulate data in multiple columns as shown below:


cols = ws['A1:D4']

print(cols)

#((<Cell 'mysheet'.A1>, <Cell 'mysheet'.B1>, <Cell 'mysheet'.C1>, <Cell 'mysheet'.D1>), (<Cell 'mysheet'.A2>, <Cell 'mysheet'.B2>, <Cell 'mysheet'.C2>, <Cell 'mysheet'.D2>), (<Cell 'mysheet'.A3>, <Cell 'mysheet'.B3>, <Cell 'mysheet'.C3>, <Cell 'mysheet'.D3>), (<Cell 'mysheet'.A4>, <Cell 'mysheet'.B4>, <Cell 'mysheet'.C4>, <Cell 'mysheet'.D4>))

Cell ranges


cells = ws['A1':'B2']

print(cells)

#output

#((<Cell 'mysheet'.A1>, <Cell 'mysheet'.B1>), (<Cell 'mysheet'.A2>, <Cell 'mysheet'.B2>))

Row ranges


cells = ws[1:5] #row 1 to row 5

print(cells)

cells = ws[3:7] #row 3 to row 7

print(cells)

#outputs

# ((<Cell 'mysheet'.A1>,), (<Cell 'mysheet'.A2>,), (<Cell 'mysheet'.A3>,), (<Cell 'mysheet'.A4>,), (<Cell 'mysheet'.A5>,))
# ((<Cell 'mysheet'.A3>,), (<Cell 'mysheet'.A4>,), (<Cell 'mysheet'.A5>,), (<Cell 'mysheet'.A6>,), (<Cell 'mysheet'.A7>,))

Column Range


cells = ws['A:C'] #columns from A to C

print(cells)

#outputs

# ((<Cell 'mysheet'.A1>,), (<Cell 'mysheet'.B1>,), (<Cell 'mysheet'.C1>,))
<h3>Row number</h3>
cells = ws[5] #row 5

print(cells)

cells = ws[11] #row 11

print(cells)


#outputs

# (<Cell 'mysheet'.A5>,)
# (<Cell 'mysheet'.A11>,)

Column Number


cells = ws['A'] #column A

print(cells)

cells = ws['B'] #column B

print(cells)


#outputs

# (<Cell 'mysheet'.A1>,)
# (<Cell 'mysheet'.B1>,)


Loading an existing workbook

To access items or values in an already existing workbook, you can use the load_workbook in the openpyxl.


from openpyxl import load_workbook

wb = load_workbook('hello.xlsx')

ws = wb.active

items = ws['a1':'b2']

#cells you want to read

for row in items:

    for col in  row:

    print(col)

#outputs

# <Cell 'Sheet'.A1>
# <Cell 'Sheet'.B1>
# <Cell 'Sheet'.A2>
# <Cell 'Sheet'.B2>

However, if you want to see the values contained in the cells, you have to use the attribute values as shown below


for row in items:

    for col in  row:

    print(col.value)

#output

# 20

# 20

# 20

# 20

Leave a Reply

Your email address will not be published. Required fields are marked *