본문 바로가기

머신러닝

파이썬_쇼핑몰 매출 분석1(판매 데이터 전처리, 전체매출, 지역별 매출 분석)

 

 

 

 

 

목표

  1. 아이템별 지표 확인하기
  2. 시간별 지역별 판매 지표 확인하기
In [1]:
import numpy as np
import pandas as pd
# seaborn
import seaborn as sns
COLORS = sns.color_palette()
import matplotlib as plt

%matplotlib inline
 

데이터 로딩

  1. 정제된 데이터 사용(retail.csv)
In [2]:
dtypes = {
    'UnitPrice': np.float32,
    'CustomerID': np.int32,
    'Quantity': np.int32
}
retail = pd.read_csv('./OnlineRetailClean.csv', dtype=dtypes)
retail.head()
Out[2]:
  Unnamed: 0 InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country CheckoutPrice
0 0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/2010 8:26 2.55 17850 United Kingdom 15.30
1 1 536365 71053 WHITE METAL LANTERN 6 12/1/2010 8:26 3.39 17850 United Kingdom 20.34
2 2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 12/1/2010 8:26 2.75 17850 United Kingdom 22.00
3 3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/2010 8:26 3.39 17850 United Kingdom 20.34
4 4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 12/1/2010 8:26 3.39 17850 United Kingdom 20.34
 

날짜 타입 데이터 변환

  • 문자열로 로딩하는 것보다 date/datetime 타입으로 로딩하는 것이 분석에 용이
In [3]:
retail['InvoiceDate']=pd.to_datetime(retail['InvoiceDate'],infer_datetime_format=True)
retail.head()
Out[3]:
  Unnamed: 0 InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country CheckoutPrice
0 0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850 United Kingdom 15.30
1 1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850 United Kingdom 20.34
2 2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850 United Kingdom 22.00
3 3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850 United Kingdom 20.34
4 4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850 United Kingdom 20.34
 

해당 기간 동안의 매출

  • 전체 매출
  • 국가별 매출
  • 월별 매출
  • 요일별 매출
  • 시간별 매출
 

전체 매출

In [4]:
total_revenue=retail['CheckoutPrice'].sum()
total_revenue
Out[4]:
8911407.904
 

국가별 매출

In [5]:
rev_by_countries=retail.groupby('Country').sum()['CheckoutPrice'].sort_values()
rev_by_countries
Out[5]:
Country
Saudi Arabia            1.459200e+02
Bahrain                 5.484000e+02
Czech Republic          8.267400e+02
RSA                     1.002310e+03
Brazil                  1.143600e+03
European Community      1.300250e+03
Lithuania               1.661060e+03
Lebanon                 1.693880e+03
United Arab Emirates    1.902280e+03
Unspecified             2.667070e+03
Malta                   2.725590e+03
USA                     3.580390e+03
Canada                  3.666380e+03
Iceland                 4.310000e+03
Greece                  4.760520e+03
Israel                  7.221690e+03
Poland                  7.334650e+03
Austria                 1.019868e+04
Cyprus                  1.359038e+04
Italy                   1.748324e+04
Denmark                 1.895534e+04
Channel Islands         2.045044e+04
Singapore               2.127929e+04
Finland                 2.254608e+04
Portugal                3.343989e+04
Norway                  3.616544e+04
Japan                   3.741637e+04
Sweden                  3.837833e+04
Belgium                 4.119634e+04
Switzerland             5.644395e+04
Spain                   6.157711e+04
Australia               1.385213e+05
France                  2.090240e+05
Germany                 2.288671e+05
EIRE                    2.655459e+05
Netherlands             2.854463e+05
United Kingdom          7.308392e+06
Name: CheckoutPrice, dtype: float64
In [6]:
plot=rev_by_countries.plot(kind='bar',figsize=(20,10))
plot.set_xlabel('Country',fontsize=11)
plot.set_ylabel('Revenue',fontsize=11)
plot.set_title('Revenue by Country', fontsize=13)
plot.set_xticklabels(labels=rev_by_countries.index, rotation=45)
Out[6]:
[Text(0, 0, 'Saudi Arabia'),
 Text(0, 0, 'Bahrain'),
 Text(0, 0, 'Czech Republic'),
 Text(0, 0, 'RSA'),
 Text(0, 0, 'Brazil'),
 Text(0, 0, 'European Community'),
 Text(0, 0, 'Lithuania'),
 Text(0, 0, 'Lebanon'),
 Text(0, 0, 'United Arab Emirates'),
 Text(0, 0, 'Unspecified'),
 Text(0, 0, 'Malta'),
 Text(0, 0, 'USA'),
 Text(0, 0, 'Canada'),
 Text(0, 0, 'Iceland'),
 Text(0, 0, 'Greece'),
 Text(0, 0, 'Israel'),
 Text(0, 0, 'Poland'),
 Text(0, 0, 'Austria'),
 Text(0, 0, 'Cyprus'),
 Text(0, 0, 'Italy'),
 Text(0, 0, 'Denmark'),
 Text(0, 0, 'Channel Islands'),
 Text(0, 0, 'Singapore'),
 Text(0, 0, 'Finland'),
 Text(0, 0, 'Portugal'),
 Text(0, 0, 'Norway'),
 Text(0, 0, 'Japan'),
 Text(0, 0, 'Sweden'),
 Text(0, 0, 'Belgium'),
 Text(0, 0, 'Switzerland'),
 Text(0, 0, 'Spain'),
 Text(0, 0, 'Australia'),
 Text(0, 0, 'France'),
 Text(0, 0, 'Germany'),
 Text(0, 0, 'EIRE'),
 Text(0, 0, 'Netherlands'),
 Text(0, 0, 'United Kingdom')]
 
In [7]:
plot=(rev_by_countries/total_revenue).plot(kind='pie',figsize=(20,10))
 
 

그래프 유틸 함수

In [8]:
def plot_bar(df, xlabel,ylabel,title,color=COLORS[0], figsize=(30,10), rotation=45):
    plot=df.plot(kind='bar')
    plot.set_xlabel(xlabel,fontsize=11)
    plot.set_ylabel(ylabel,fontsize=11)
    plot.set_title(title, fontsize=13)
    plot.set_xticklabels(labels=df.index, rotation=rotation)
    
plot_bar(rev_by_countries,'Country','Revenue','Revenue by Country')