{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyP+rgFNnGFA3IN0RbmBwpZK"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Creating New Columns Based on Existing Columns\n","\n","In addition to the `.mean()`, `.median()`, `.describe()`, and `.agg()` arguments, we can also use arithmetic operators to perform calculations on values in a `DataFrame`.\n","\n","Let's introduce a new sample dataset, this time with air quality data for measurement stations in London, Paris, and Antwerp. Values in this dataset include nitrogen dioxide (NO2) concentration expressed as parts per million (`ppm`).\n"],"metadata":{"id":"EsEkwMtRcTJx"}},{"cell_type":"code","source":["import pandas as pd, numpy as np # import statements\n","df = pd.read_csv(\"https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch7/air_quality_no2.csv\", index_col=0, parse_dates=True)\n","df.info()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Or3qMl6ecnAA","executionInfo":{"status":"ok","timestamp":1705965183471,"user_tz":300,"elapsed":975,"user":{"displayName":"Katherine Walden","userId":"17094108395123900917"}},"outputId":"900619bc-ecfb-4463-9f59-f93c78035da5"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","DatetimeIndex: 1035 entries, 2019-05-07 02:00:00 to 2019-06-21 02:00:00\n","Data columns (total 3 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------- ----- \n"," 0 station_antwerp 95 non-null float64\n"," 1 station_paris 1004 non-null float64\n"," 2 station_london 969 non-null float64\n","dtypes: float64(3)\n","memory usage: 32.3 KB\n"]}]},{"cell_type":"markdown","source":["## Example #1\n","\n","Let's say we wanted to express the London station's NO2 concentration as milligrams per cubic meter (mg/m3). For our purposes, we are assuming a temperature of 25 degrees Celsius and pressure of 1013 hPa, which means the conversion factor is 1.882.\n","\n","

\n","\n","We would need to convert all of the `station_london` column values from `ppm` to mg/m3. And we would want to store the results of that calculation in a newly-created column."],"metadata":{"id":"DuAeQLgccvtN"}},{"cell_type":"code","source":["df['london_mg_per_cubic'] = df['station_london'] * 1.882 # create new column from arithmetic operation\n","df.head() # show output"],"metadata":{"id":"caYbjmPwc1iO"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Example #2\n","\n","

\n","\n","Let's say we wanted to calculate the ratio of the Paris versus Antwerp station values and store that result in a new column. We would need to calculate the ratio for each row and store the results of the calculation in a new column."],"metadata":{"id":"R8DT1P2udKo8"}},{"cell_type":"code","source":["df[\"ratio_paris_antwerp\"] = (df[\"station_paris\"] / df[\"station_antwerp\"]) # calculate ration\n","df.head() # show output"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":344},"id":"CkSV1sERdNuR","executionInfo":{"status":"ok","timestamp":1705965335856,"user_tz":300,"elapsed":137,"user":{"displayName":"Katherine Walden","userId":"17094108395123900917"}},"outputId":"fff74e21-534b-461d-e871-f3fc3c2b6765"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" station_antwerp station_paris station_london \\\n","datetime \n","2019-05-07 02:00:00 NaN NaN 23.0 \n","2019-05-07 03:00:00 50.5 25.0 19.0 \n","2019-05-07 04:00:00 45.0 27.7 19.0 \n","2019-05-07 05:00:00 NaN 50.4 16.0 \n","2019-05-07 06:00:00 NaN 61.9 NaN \n","\n"," ratio_paris_antwerp \n","datetime \n","2019-05-07 02:00:00 NaN \n","2019-05-07 03:00:00 0.495050 \n","2019-05-07 04:00:00 0.615556 \n","2019-05-07 05:00:00 NaN \n","2019-05-07 06:00:00 NaN "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
station_antwerpstation_parisstation_londonratio_paris_antwerp
datetime
2019-05-07 02:00:00NaNNaN23.0NaN
2019-05-07 03:00:0050.525.019.00.495050
2019-05-07 04:00:0045.027.719.00.615556
2019-05-07 05:00:00NaN50.416.0NaN
2019-05-07 06:00:00NaN61.9NaNNaN
\n","
\n","
\n","\n","
\n"," \n","\n"," \n","\n"," \n","
\n","\n","\n","
\n"," \n","\n","\n","\n"," \n","
\n","
\n","
\n"]},"metadata":{},"execution_count":2}]},{"cell_type":"markdown","source":["## Element-Wise Calculation\n","\n","Note that we do not need to iterate over all rows for a specific dataframe column to perform this calculation. `pandas` performs the calculation `element_wise`, that is on all of the values in the column at once.\n","\n","In example #2, because this is an element-wise calculation, the division operation is applied to all rows in the data frame. Python's other mathematical (`+`, `-`, `*`, `/`) and logical (`<`, `>`, `=`, etc.) all work element-wise."],"metadata":{"id":"oebpANy3c-aI"}}]}