Creating New Columns Based on Existing Columns

Creating New Columns Based on Existing Columns#

In addition to the .mean(), .median(), .describe(), and .agg() arguments, we can also use arithmetic operators to perform calculations on values in a DataFrame.

Let’s introduce a new sample dataset, this time with air quality data for measurement stations in London, Paris, and Antwerp. Values in this dataset include nitrogen dioxide (NO2) concentration expressed as parts per million (ppm).

import pandas as pd, numpy as np # import statements
df = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch7/air_quality_no2.csv", index_col=0, parse_dates=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1035 entries, 2019-05-07 02:00:00 to 2019-06-21 02:00:00
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   station_antwerp  95 non-null     float64
 1   station_paris    1004 non-null   float64
 2   station_london   969 non-null    float64
dtypes: float64(3)
memory usage: 32.3 KB

Example #1#

Let’s say we wanted to express the London station’s NO2 concentration as milligrams per cubic meter (mg/m3). For our purposes, we are assuming a temperature of 25 degrees Celsius and pressure of 1013 hPa, which means the conversion factor is 1.882.

We would need to convert all of the station_london column values from ppm to mg/m3. And we would want to store the results of that calculation in a newly-created column.

df['london_mg_per_cubic'] = df['station_london'] * 1.882 # create new column from arithmetic operation
df.head() # show output

Example #2#

Let’s say we wanted to calculate the ratio of the Paris versus Antwerp station values and store that result in a new column. We would need to calculate the ratio for each row and store the results of the calculation in a new column.

df["ratio_paris_antwerp"] = (df["station_paris"] / df["station_antwerp"]) # calculate ration
df.head() # show output
station_antwerp station_paris station_london ratio_paris_antwerp
datetime
2019-05-07 02:00:00 NaN NaN 23.0 NaN
2019-05-07 03:00:00 50.5 25.0 19.0 0.495050
2019-05-07 04:00:00 45.0 27.7 19.0 0.615556
2019-05-07 05:00:00 NaN 50.4 16.0 NaN
2019-05-07 06:00:00 NaN 61.9 NaN NaN

Element-Wise Calculation#

Note that we do not need to iterate over all rows for a specific dataframe column to perform this calculation. pandas performs the calculation element_wise, that is on all of the values in the column at once.

In example #2, because this is an element-wise calculation, the division operation is applied to all rows in the data frame. Python’s other mathematical (+, -, *, /) and logical (<, >, =, etc.) all work element-wise.