From Data File to Data Frame

Contents

From Data File to Data Frame#

Panopto logo From Data File to DataFrame

It’s far more likely that you will load structured data from a file into Python, rather than manually creating a DataFrame. For this section of the lab, we’re going to work with data about Titanic passengers. Navigate to https://raw.githubusercontent.com/kwaldenphd/pandas-intro/main/data/titanic.csv in a web browser to see the dataset.

Loading Data#

We can load structured data into Python from a file located on our computer or from a URL, using pd.read_csv(). An example of how we would load the titanic.csv file in Python as a Pandas DataFrame:

# import pandas
import pandas as pd

# load titanic data from csv file
# titanic = pd.read_csv("titanic.csv")

# load titanic data from url
titanic = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/pandas-intro/main/data/titanic.csv")

# show first 5 rows of newly-loaded dataframe
titanic.head(5)

pandas provides the read_csv() function which stores .csv data as a pandas DataFrame. This read_ prefix can be used with other structured data file formats (excel, json, sas, etc)

titanic

We can also check the data type for each column using .dtypes.

titanic.dtypes

From this output, we know we have integers (int64), floats (float64), and strings (object). Maybe we want a more technical summary of this DataFrame.

titanic.info()

.info() returns row numbers, the number of entries, column names, column data types, and the number of non-null values in each column. We can see from the Non-Null Count values that some columns do have null or missing values. .info() also tells us how much memory (RAM) is used to store this DataFrame.