{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyP+rgFNnGFA3IN0RbmBwpZK"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Creating New Columns Based on Existing Columns\n","\n","In addition to the `.mean()`, `.median()`, `.describe()`, and `.agg()` arguments, we can also use arithmetic operators to perform calculations on values in a `DataFrame`.\n","\n","Let's introduce a new sample dataset, this time with air quality data for measurement stations in London, Paris, and Antwerp. Values in this dataset include nitrogen dioxide (<code>NO<sub>2</sub></code>) concentration expressed as parts per million (`ppm`).\n"],"metadata":{"id":"EsEkwMtRcTJx"}},{"cell_type":"code","source":["import pandas as pd, numpy as np # import statements\n","df = pd.read_csv(\"https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch7/air_quality_no2.csv\", index_col=0, parse_dates=True)\n","df.info()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Or3qMl6ecnAA","executionInfo":{"status":"ok","timestamp":1705965183471,"user_tz":300,"elapsed":975,"user":{"displayName":"Katherine Walden","userId":"17094108395123900917"}},"outputId":"900619bc-ecfb-4463-9f59-f93c78035da5"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["<class 'pandas.core.frame.DataFrame'>\n","DatetimeIndex: 1035 entries, 2019-05-07 02:00:00 to 2019-06-21 02:00:00\n","Data columns (total 3 columns):\n"," #   Column           Non-Null Count  Dtype  \n","---  ------           --------------  -----  \n"," 0   station_antwerp  95 non-null     float64\n"," 1   station_paris    1004 non-null   float64\n"," 2   station_london   969 non-null    float64\n","dtypes: float64(3)\n","memory usage: 32.3 KB\n"]}]},{"cell_type":"markdown","source":["## Example #1\n","\n","Let's say we wanted to express the London station's <code>NO<sub>2</sub></code> concentration as milligrams per cubic meter (mg/m<sup>3</sup>). For our purposes, we are assuming a temperature of 25 degrees Celsius and pressure of 1013 hPa, which means the conversion factor is 1.882.\n","\n","<p align=\"center\"><a href=\"https://github.com/kwaldenphd/eda-pandas/blob/main/figures/Figure_2.svg?raw=true\"><img class=\"aligncenter\" src=\"https://github.com/kwaldenphd/eda-pandas/blob/main/figures/Figure_2.svg?raw=true\" /></a></p>\n","\n","We would need to convert all of the `station_london` column values from `ppm` to <code>mg/m<sup>3</sup></code>. And we would want to store the results of that calculation in a newly-created column."],"metadata":{"id":"DuAeQLgccvtN"}},{"cell_type":"code","source":["df['london_mg_per_cubic'] = df['station_london'] * 1.882 # create new column from arithmetic operation\n","df.head() # show output"],"metadata":{"id":"caYbjmPwc1iO"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Example #2\n","\n","<p align=\"center\"><a href=\"https://github.com/kwaldenphd/eda-pandas/blob/main/figures/Figure_3.svg?raw=true\"><img class=\"aligncenter\" src=\"https://github.com/kwaldenphd/eda-pandas/blob/main/figures/Figure_3.svg?raw=true\" /></a></p>\n","\n","Let's say we wanted to calculate the ratio of the Paris versus Antwerp station values and store that result in a new column. We would need to calculate the ratio for each row and store the results of the calculation in a new column."],"metadata":{"id":"R8DT1P2udKo8"}},{"cell_type":"code","source":["df[\"ratio_paris_antwerp\"] = (df[\"station_paris\"] / df[\"station_antwerp\"]) # calculate ration\n","df.head() # show output"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":344},"id":"CkSV1sERdNuR","executionInfo":{"status":"ok","timestamp":1705965335856,"user_tz":300,"elapsed":137,"user":{"displayName":"Katherine Walden","userId":"17094108395123900917"}},"outputId":"fff74e21-534b-461d-e871-f3fc3c2b6765"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["                     station_antwerp  station_paris  station_london  \\\n","datetime                                                              \n","2019-05-07 02:00:00              NaN            NaN            23.0   \n","2019-05-07 03:00:00             50.5           25.0            19.0   \n","2019-05-07 04:00:00             45.0           27.7            19.0   \n","2019-05-07 05:00:00              NaN           50.4            16.0   \n","2019-05-07 06:00:00              NaN           61.9             NaN   \n","\n","                     ratio_paris_antwerp  \n","datetime                                  \n","2019-05-07 02:00:00                  NaN  \n","2019-05-07 03:00:00             0.495050  \n","2019-05-07 04:00:00             0.615556  \n","2019-05-07 05:00:00                  NaN  \n","2019-05-07 06:00:00                  NaN  "],"text/html":["\n","  <div id=\"df-2f8e5f3e-177f-4b1e-8151-56140ef43ce0\" class=\"colab-df-container\">\n","    <div>\n","<style scoped>\n","    .dataframe tbody tr th:only-of-type {\n","        vertical-align: middle;\n","    }\n","\n","    .dataframe tbody tr th {\n","        vertical-align: top;\n","    }\n","\n","    .dataframe thead th {\n","        text-align: right;\n","    }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n","  <thead>\n","    <tr style=\"text-align: right;\">\n","      <th></th>\n","      <th>station_antwerp</th>\n","      <th>station_paris</th>\n","      <th>station_london</th>\n","      <th>ratio_paris_antwerp</th>\n","    </tr>\n","    <tr>\n","      <th>datetime</th>\n","      <th></th>\n","      <th></th>\n","      <th></th>\n","      <th></th>\n","    </tr>\n","  </thead>\n","  <tbody>\n","    <tr>\n","      <th>2019-05-07 02:00:00</th>\n","      <td>NaN</td>\n","      <td>NaN</td>\n","      <td>23.0</td>\n","      <td>NaN</td>\n","    </tr>\n","    <tr>\n","      <th>2019-05-07 03:00:00</th>\n","      <td>50.5</td>\n","      <td>25.0</td>\n","      <td>19.0</td>\n","      <td>0.495050</td>\n","    </tr>\n","    <tr>\n","      <th>2019-05-07 04:00:00</th>\n","      <td>45.0</td>\n","      <td>27.7</td>\n","      <td>19.0</td>\n","      <td>0.615556</td>\n","    </tr>\n","    <tr>\n","      <th>2019-05-07 05:00:00</th>\n","      <td>NaN</td>\n","      <td>50.4</td>\n","      <td>16.0</td>\n","      <td>NaN</td>\n","    </tr>\n","    <tr>\n","      <th>2019-05-07 06:00:00</th>\n","      <td>NaN</td>\n","      <td>61.9</td>\n","      <td>NaN</td>\n","      <td>NaN</td>\n","    </tr>\n","  </tbody>\n","</table>\n","</div>\n","    <div class=\"colab-df-buttons\">\n","\n","  <div class=\"colab-df-container\">\n","    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-2f8e5f3e-177f-4b1e-8151-56140ef43ce0')\"\n","            title=\"Convert this dataframe to an interactive table.\"\n","            style=\"display:none;\">\n","\n","  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n","    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n","  </svg>\n","    </button>\n","\n","  <style>\n","    .colab-df-container {\n","      display:flex;\n","      gap: 12px;\n","    }\n","\n","    .colab-df-convert {\n","      background-color: #E8F0FE;\n","      border: none;\n","      border-radius: 50%;\n","      cursor: pointer;\n","      display: none;\n","      fill: #1967D2;\n","      height: 32px;\n","      padding: 0 0 0 0;\n","      width: 32px;\n","    }\n","\n","    .colab-df-convert:hover {\n","      background-color: #E2EBFA;\n","      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n","      fill: #174EA6;\n","    }\n","\n","    .colab-df-buttons div {\n","      margin-bottom: 4px;\n","    }\n","\n","    [theme=dark] .colab-df-convert {\n","      background-color: #3B4455;\n","      fill: #D2E3FC;\n","    }\n","\n","    [theme=dark] .colab-df-convert:hover {\n","      background-color: #434B5C;\n","      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n","      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n","      fill: #FFFFFF;\n","    }\n","  </style>\n","\n","    <script>\n","      const buttonEl =\n","        document.querySelector('#df-2f8e5f3e-177f-4b1e-8151-56140ef43ce0 button.colab-df-convert');\n","      buttonEl.style.display =\n","        google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n","      async function convertToInteractive(key) {\n","        const element = document.querySelector('#df-2f8e5f3e-177f-4b1e-8151-56140ef43ce0');\n","        const dataTable =\n","          await google.colab.kernel.invokeFunction('convertToInteractive',\n","                                                    [key], {});\n","        if (!dataTable) return;\n","\n","        const docLinkHtml = 'Like what you see? Visit the ' +\n","          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n","          + ' to learn more about interactive tables.';\n","        element.innerHTML = '';\n","        dataTable['output_type'] = 'display_data';\n","        await google.colab.output.renderOutput(dataTable, element);\n","        const docLink = document.createElement('div');\n","        docLink.innerHTML = docLinkHtml;\n","        element.appendChild(docLink);\n","      }\n","    </script>\n","  </div>\n","\n","\n","<div id=\"df-bf1e6541-31f1-45f8-bf90-e93d3b7faf8c\">\n","  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-bf1e6541-31f1-45f8-bf90-e93d3b7faf8c')\"\n","            title=\"Suggest charts\"\n","            style=\"display:none;\">\n","\n","<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n","     width=\"24px\">\n","    <g>\n","        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n","    </g>\n","</svg>\n","  </button>\n","\n","<style>\n","  .colab-df-quickchart {\n","      --bg-color: #E8F0FE;\n","      --fill-color: #1967D2;\n","      --hover-bg-color: #E2EBFA;\n","      --hover-fill-color: #174EA6;\n","      --disabled-fill-color: #AAA;\n","      --disabled-bg-color: #DDD;\n","  }\n","\n","  [theme=dark] .colab-df-quickchart {\n","      --bg-color: #3B4455;\n","      --fill-color: #D2E3FC;\n","      --hover-bg-color: #434B5C;\n","      --hover-fill-color: #FFFFFF;\n","      --disabled-bg-color: #3B4455;\n","      --disabled-fill-color: #666;\n","  }\n","\n","  .colab-df-quickchart {\n","    background-color: var(--bg-color);\n","    border: none;\n","    border-radius: 50%;\n","    cursor: pointer;\n","    display: none;\n","    fill: var(--fill-color);\n","    height: 32px;\n","    padding: 0;\n","    width: 32px;\n","  }\n","\n","  .colab-df-quickchart:hover {\n","    background-color: var(--hover-bg-color);\n","    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n","    fill: var(--button-hover-fill-color);\n","  }\n","\n","  .colab-df-quickchart-complete:disabled,\n","  .colab-df-quickchart-complete:disabled:hover {\n","    background-color: var(--disabled-bg-color);\n","    fill: var(--disabled-fill-color);\n","    box-shadow: none;\n","  }\n","\n","  .colab-df-spinner {\n","    border: 2px solid var(--fill-color);\n","    border-color: transparent;\n","    border-bottom-color: var(--fill-color);\n","    animation:\n","      spin 1s steps(1) infinite;\n","  }\n","\n","  @keyframes spin {\n","    0% {\n","      border-color: transparent;\n","      border-bottom-color: var(--fill-color);\n","      border-left-color: var(--fill-color);\n","    }\n","    20% {\n","      border-color: transparent;\n","      border-left-color: var(--fill-color);\n","      border-top-color: var(--fill-color);\n","    }\n","    30% {\n","      border-color: transparent;\n","      border-left-color: var(--fill-color);\n","      border-top-color: var(--fill-color);\n","      border-right-color: var(--fill-color);\n","    }\n","    40% {\n","      border-color: transparent;\n","      border-right-color: var(--fill-color);\n","      border-top-color: var(--fill-color);\n","    }\n","    60% {\n","      border-color: transparent;\n","      border-right-color: var(--fill-color);\n","    }\n","    80% {\n","      border-color: transparent;\n","      border-right-color: var(--fill-color);\n","      border-bottom-color: var(--fill-color);\n","    }\n","    90% {\n","      border-color: transparent;\n","      border-bottom-color: var(--fill-color);\n","    }\n","  }\n","</style>\n","\n","  <script>\n","    async function quickchart(key) {\n","      const quickchartButtonEl =\n","        document.querySelector('#' + key + ' button');\n","      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n","      quickchartButtonEl.classList.add('colab-df-spinner');\n","      try {\n","        const charts = await google.colab.kernel.invokeFunction(\n","            'suggestCharts', [key], {});\n","      } catch (error) {\n","        console.error('Error during call to suggestCharts:', error);\n","      }\n","      quickchartButtonEl.classList.remove('colab-df-spinner');\n","      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n","    }\n","    (() => {\n","      let quickchartButtonEl =\n","        document.querySelector('#df-bf1e6541-31f1-45f8-bf90-e93d3b7faf8c button');\n","      quickchartButtonEl.style.display =\n","        google.colab.kernel.accessAllowed ? 'block' : 'none';\n","    })();\n","  </script>\n","</div>\n","    </div>\n","  </div>\n"]},"metadata":{},"execution_count":2}]},{"cell_type":"markdown","source":["## Element-Wise Calculation\n","\n","Note that we do not need to iterate over all rows for a specific dataframe column to perform this calculation. `pandas` performs the calculation `element_wise`, that is on all of the values in the column at once.\n","\n","In example #2, because this is an element-wise calculation, the division operation is applied to all rows in the data frame. Python's other mathematical (`+`, `-`, `*`, `/`) and logical (`<`, `>`, `=`, etc.) all work element-wise."],"metadata":{"id":"oebpANy3c-aI"}}]}