Creating New Columns Based on Existing Columns#
In addition to the .mean()
, .median()
, .describe()
, and .agg()
arguments, we can also use arithmetic operators to perform calculations on values in a DataFrame
.
Let’s introduce a new sample dataset, this time with air quality data for measurement stations in London, Paris, and Antwerp. Values in this dataset include nitrogen dioxide (NO2
) concentration expressed as parts per million (ppm
).
import pandas as pd, numpy as np # import statements
df = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch7/air_quality_no2.csv", index_col=0, parse_dates=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1035 entries, 2019-05-07 02:00:00 to 2019-06-21 02:00:00
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 station_antwerp 95 non-null float64
1 station_paris 1004 non-null float64
2 station_london 969 non-null float64
dtypes: float64(3)
memory usage: 32.3 KB
Example #1#
Let’s say we wanted to express the London station’s NO2
concentration as milligrams per cubic meter (mg/m3). For our purposes, we are assuming a temperature of 25 degrees Celsius and pressure of 1013 hPa, which means the conversion factor is 1.882.
We would need to convert all of the station_london
column values from ppm
to mg/m3
. And we would want to store the results of that calculation in a newly-created column.
df['london_mg_per_cubic'] = df['station_london'] * 1.882 # create new column from arithmetic operation
df.head() # show output
Example #2#
Let’s say we wanted to calculate the ratio of the Paris versus Antwerp station values and store that result in a new column. We would need to calculate the ratio for each row and store the results of the calculation in a new column.
df["ratio_paris_antwerp"] = (df["station_paris"] / df["station_antwerp"]) # calculate ration
df.head() # show output
station_antwerp | station_paris | station_london | ratio_paris_antwerp | |
---|---|---|---|---|
datetime | ||||
2019-05-07 02:00:00 | NaN | NaN | 23.0 | NaN |
2019-05-07 03:00:00 | 50.5 | 25.0 | 19.0 | 0.495050 |
2019-05-07 04:00:00 | 45.0 | 27.7 | 19.0 | 0.615556 |
2019-05-07 05:00:00 | NaN | 50.4 | 16.0 | NaN |
2019-05-07 06:00:00 | NaN | 61.9 | NaN | NaN |
Element-Wise Calculation#
Note that we do not need to iterate over all rows for a specific dataframe column to perform this calculation. pandas
performs the calculation element_wise
, that is on all of the values in the column at once.
In example #2, because this is an element-wise calculation, the division operation is applied to all rows in the data frame. Python’s other mathematical (+
, -
, *
, /
) and logical (<
, >
, =
, etc.) all work element-wise.