This tutorial aims to guide you through the process of data manipulation using the Pandas library in Python. By the end of this tutorial, you will be able to load, inspect, filter, and modify datasets using a variety of Pandas functions and methods.
What Will You Learn?
- Loading data into Pandas
- Inspecting data
- Filtering data
- Modifying data
Prerequisites
This tutorial assumes that you have a basic understanding of Python programming. It would also be helpful to have some familiarity with data structures, particularly lists and dictionaries.
Loading Data into Pandas
Pandas can read data from a variety of file formats such as CSV, Excel, and SQL databases. The most common function used to load data is pd.read_csv()
.
import pandas as pd
# Load data from a CSV file
df = pd.read_csv('file.csv')
Inspecting Data
You can inspect the first few rows of the DataFrame using the .head()
method, or the last few rows using the .tail()
method.
# Print the first 5 rows
print(df.head())
# Print the last 5 rows
print(df.tail())
Filtering Data
Filtering is one of the most frequent data manipulation operation. Pandas provides a powerful and flexible way to filter data.
# Filter rows where column 'A' is greater than 50
filtered_df = df[df['A'] > 50]
Modifying Data
DataFrames are mutable, so you can easily modify them by adding columns, changing values, etc.
# Add a new column 'C' which is the sum of 'A' and 'B'
df['C'] = df['A'] + df['B']
Example 1: Load and Inspect Data
import pandas as pd
# Load data
df = pd.read_csv('file.csv')
# Inspect first 5 rows
print(df.head())
Example 2: Filter Data
# Filter rows where 'Age' is greater than 30
filtered_df = df[df['Age'] > 30]
# Print the filtered data
print(filtered_df)
Example 3: Modify Data
# Add a new column 'FullName' which is the combination of 'FirstName' and 'LastName'
df['FullName'] = df['FirstName'] + ' ' + df['LastName']
# Print the DataFrame to check the result
print(df)
In this tutorial, we have covered how to load, inspect, filter, and modify data using Pandas in Python. As next steps, you might want to explore more advanced data manipulation techniques such as grouping and aggregation, joining and merging data, and working with time series data.
Exercise 1: Load a dataset from a CSV file and print the first 10 rows.
Solution:
# Load data
df = pd.read_csv('file.csv')
# Print the first 10 rows
print(df.head(10))
Exercise 2: Filter the DataFrame to include only rows where 'Age' is less than 20 and 'Gender' is 'Female'. Print the filtered DataFrame.
Solution:
# Filter data
filtered_df = df[(df['Age'] < 20) & (df['Gender'] == 'Female')]
# Print the filtered data
print(filtered_df)
Exercise 3: Add a new column 'AgeInMonths', which is 'Age' multiplied by 12. Print the DataFrame to check the result.
Solution:
# Add a new column 'AgeInMonths'
df['AgeInMonths'] = df['Age'] * 12
# Print the DataFrame
print(df)