{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"toc_visible":true,"authorship_tag":"ABX9TyOTs3nAt5cC2sUcX14ItFdE"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Filtering\n","\n","We can use Python's comparison operators to return rows in our `DataFrame` that meet specific conditions."],"metadata":{"id":"CrtOEmT1Cqpe"}},{"cell_type":"code","source":["subset = df[df['MAR'] == '1'] # create new dataframe with specific MAR values\n","subset # show output"],"metadata":{"id":"cxhD-n8eDDL2"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["In this example, we had to put `1` in quotation marks, because this column is being treated as a string object.\n","\n","We would need to change the data type to be able to make numeric comparisons or filters."],"metadata":{"id":"OuvaCX2DDfGP"}},{"cell_type":"code","source":["df = df.astype(int) # change datatype for all clumns\n","df.info() # show updated technical summary"],"metadata":{"id":"jadPtagPDt9u"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["subset2 = df[df['AGEP'] > 30] # create new df with AGEP values over specific threshold\n","subset2 # show output"],"metadata":{"id":"L4qQBA9fDVlo"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["We use brackets (`[]`) to set a condition rows must meet to be assigned to the new dataframe. If we just wanted to see whether rows meet this condition in the original `DataFrame`, we could just test for the condition without creating a new `DataFrame`."],"metadata":{"id":"LBtJNGi1D_AP"}},{"cell_type":"code","source":["df['AGEP'] > 30 # return boolean values for conditional test"],"metadata":{"id":"wslIhBYeEAKw"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Advanced Filtering"],"metadata":{"id":"yFULDdyaFBbT"}},{"cell_type":"markdown","source":["### `.isin()`\n","\n","The `isin()` conditional function on its own would return a `True` or `False` value. By nesting the `isin()` function in brackets (`[]`), we are filtering rows based on rows that meet the function critera, or return as `True` from this function."],"metadata":{"id":"mV5YNcuvEKpO"}},{"cell_type":"code","source":["subset3 = df[df['MAR'].isin([2,3])] # create new df with specific MAR values\n","subset3 # show output"],"metadata":{"id":"yquy1BbQETaV"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### Boolean Operators\n","\n","We could also break out the chained or compound conditional statement using an `OR` operator, `|`."],"metadata":{"id":"J_jMoMK0Egr-"}},{"cell_type":"code","source":["subset4 = df[(df['MAR'] == 2) | (df['AGEP'] > 50)] # filter using two conditions and the OR operator\n","subset4 # show output"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":424},"id":"1A7a_zDbEliA","executionInfo":{"status":"ok","timestamp":1705958923149,"user_tz":300,"elapsed":124,"user":{"displayName":"Katherine Walden","userId":"17094108395123900917"}},"outputId":"9f64d9b9-c446-4b2d-899f-6dca1a299350"},"execution_count":10,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" SEX PWGTP MAR SCHL\n","1 2 23 2 24\n","3 1 80 5 24\n","5 1 107 3 24\n","8 2 127 5 24\n","12 1 70 5 24\n","... ... ... ... ...\n","44072 1 67 1 24\n","44073 1 127 1 24\n","44074 2 127 1 24\n","44075 2 56 1 24\n","44078 1 102 1 24\n","\n","[34170 rows x 4 columns]"],"text/html":["\n","
\n"," | SEX | \n","PWGTP | \n","MAR | \n","SCHL | \n","
---|---|---|---|---|
1 | \n","2 | \n","23 | \n","2 | \n","24 | \n","
3 | \n","1 | \n","80 | \n","5 | \n","24 | \n","
5 | \n","1 | \n","107 | \n","3 | \n","24 | \n","
8 | \n","2 | \n","127 | \n","5 | \n","24 | \n","
12 | \n","1 | \n","70 | \n","5 | \n","24 | \n","
... | \n","... | \n","... | \n","... | \n","... | \n","
44072 | \n","1 | \n","67 | \n","1 | \n","24 | \n","
44073 | \n","1 | \n","127 | \n","1 | \n","24 | \n","
44074 | \n","2 | \n","127 | \n","1 | \n","24 | \n","
44075 | \n","2 | \n","56 | \n","1 | \n","24 | \n","
44078 | \n","1 | \n","102 | \n","1 | \n","24 | \n","
34170 rows × 4 columns
\n","