How to Analyze Ecommerce Purchases: A Step-by-Step Guide
By Deepnote team
•Updated on November 23, 2023
Learn to process and analyze San Francisco city employee salary data using Pandas. This tutorial covers data manipulation techniques, visualization, and insights extraction.
In this tutorial, we'll explore a dataset of fictional ecommerce purchases, showcasing how to effectively utilize Python and Pandas for data analysis. Our dataset is a compilation of simulated transactions on Amazon. Remember, the data is fabricated and may not align with real-world scenarios.
Step 1: Setting Up Your Environment
Firstly, import the Pandas library and load the dataset into a DataFrame. We'll name our DataFrame ecom
for easy reference.
import pandas as pd
ecom = pd.read_csv('Ecommerce Purchases')
Step 2: Getting to Know Your Data
Before diving into analysis, it's crucial to understand your dataset's structure. Use ecom.head()
to view the first few rows.
ecom.head()
This will display the top entries in your DataFrame, giving you a glimpse of the various columns and types of data available.
Address Lot AM or PM Browser Info Company Credit Card CC Exp Date CC Security Code CC Provider Email Job IP Address Language Purchase Price
0 16629 Pace Camp Apt. 448\nAlexisborough, NE 77... 46 in PM Opera/9.56.(X11; Linux x86_64; sl-SI) Presto/2... Martinez-Herman 6011929061123406 02/20 900 JCB 16 digit pdunlap@yahoo.com Scientist, product/process development 149.146.147.205 el 98.14
1 9374 Jasmine Spurs Suite 508\nSouth John, TN 8... 28 rn PM Opera/8.93.(Windows 98; Win 9x 4.90; en-US) Pr... Fletcher, Richards and Whitaker 3337758169645356 11/18 561 Mastercard anthony41@reed.com Drilling engineer 15.160.41.51 fr 70.73
2 Unit 0065 Box 5052\nDPO AP 27450 94 vE PM Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... Simpson, Williams and Pham 675957666125 08/19 699 JCB 16 digit amymiller@morales-harrison.com Customer service manager 132.207.160.22 de 0.95
3 7780 Julia Fords\nNew Stacy, WA 45798 36 vm PM Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0 ... Williams, Marshall and Buchanan 6011578504430710 02/24 384 Discover brent16@olson-robinson.info Drilling engineer 30.250.74.19 es 78.04
4 23012 Munoz Drive Suite 337\nNew Cynthia, TX 5... 20 IE AM Opera/9.58.(X11; Linux x86_64; it-IT) Presto/2... Brown, Watson and Andrews 6011456623207998 10/25 678 Diners Club / Carte Blanche christopherwright@gmail.com Fine artist 24.140.33.94 es 77.82
Step 3: Basic Statistics and Information
To get a sense of the scale of your dataset, use ecom.info()
to view the number of entries, columns, and data types.
ecom.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
Address 10000 non-null object
Lot 10000 non-null object
AM or PM 10000 non-null object
Browser Info 10000 non-null object
Company 10000 non-null object
Credit Card 10000 non-null int64
CC Exp Date 10000 non-null object
CC Security Code 10000 non-null int64
CC Provider 10000 non-null object
Email 10000 non-null object
Job 10000 non-null object
IP Address 10000 non-null object
Language 10000 non-null object
Purchase Price 10000 non-null float64
dtypes: float64(1), int64(2), object(11)
memory usage: 1.1+ MB
Step 4: Calculating Average Purchase Price
To find the average purchase price, apply the mean()
function to the Purchase Price
column.
ecom['Purchase Price'].mean()
50.34730200000025
Step 5: Identifying Price Extremes
Determine the highest and lowest purchase prices with the max()
and min()
functions.
highest_price = ecom['Purchase Price'].max()
lowest_price = ecom['Purchase Price'].min()
Step 6: Language Preferences
To see how many customers prefer English, filter the DataFrame by the Language
column.
english_speakers = ecom[ecom['Language']=='en'].count()
Step 7: Job Title Analysis
If you're interested in how many customers are lawyers, filter by the Job
column.
lawyers_count = ecom[ecom['Job'] == 'Lawyer'].count()
30
Step 8: Purchase Times
You can use value_counts()
to see how many purchases were made in the AM vs. PM.
purchase_times = ecom['AM or PM'].value_counts()
purchase_times
PM 5068
AM 4932
Name: AM or PM, dtype: int64
Step 9: Common Job Titles
Identify the most common job titles using value_counts()
on the Job
column.
common_jobs = ecom['Job'].value_counts().head(5)
common_jobs
Interior and spatial designer 31
Lawyer 30
Social researcher 28
Purchasing manager 27
Designer, jewellery 27
Name: Job, dtype: int64
Step 10: Specific Transaction Details
To find details about a specific transaction, filter by a unique identifier, like Lot
.
specific_purchase = ecom[ecom['Lot']=='90 WT']['Purchase Price']
specific_purchase
513 75.1
Name: Purchase Price, dtype: float64
Step 11: Finding Email by Credit Card Number
If you need to find a user's email based on their credit card number:
user_email = ecom[ecom["Credit Card"] == 4926535242672853]['Email']
user_email
1234 bondellen@williams-garza.com
Name: Email, dtype: object
Step 12: High-value Transactions
To analyze high-value transactions with a specific credit card provider:
high_value_transactions = ecom[(ecom['CC Provider']=='American Express') & (ecom['Purchase Price']>95)].count()
high_value_transactions
Address 39
Lot 39
AM or PM 39
Browser Info 39
Company 39
Credit Card 39
CC Exp Date 39
CC Security Code 39
CC Provider 39
Email 39
Job 39
IP Address 39
Language 39
Purchase Price 39
dtype: int64
Step 13: Credit Card Expiration Analysis
For credit cards expiring in a specific year, use a lambda function to filter.
expiring_cards_2025 = sum(ecom['CC Exp Date'].apply(lambda x: x[3:]) == '25')
expiring_cards_2025
1033
Step 14: Popular Email Providers
Finally, to determine the most popular email providers among your customers:
email_providers = ecom['Email'].apply(lambda x: x.split('@')[1]).value_counts().head(5)
email_providers
hotmail.com 1638
yahoo.com 1616
gmail.com 1605
smith.com 42
williams.com 37
Name: Email, dtype: int64
Summary
In this tutorial, we've explored various methods to analyze an ecommerce dataset using Python and Pandas. We covered basic DataFrame operations, statistical analysis, and data filtering techniques, demonstrating how to extract meaningful insights from an ecommerce dataset. These techniques are fundamental in data science and can be applied to a wide range of datasets for insightful analysis.