Last active
March 16, 2023 08:32
-
-
Save Sassano-Weld/c465af7f644a97802707ce4b02e7c09c to your computer and use it in GitHub Desktop.
Sales forecasting model written for GOOGLE BIGQUERY
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--------**********--------- | |
/* | |
AUTHOR: CHRISTIAN SASSANO/DARYL PAUL : WELD TECHNOLOGIES ApS | |
PUBLISHED: MAY 2022 | |
https://weld.app | |
Join us in slack: https://join.slack.com/t/weldcommunity/shared_invite/zt-19b372339-1m9ftp0YW2AU8ToZpBViAA | |
Sales forecasting model written for GOOGLE BIGQUERY using the following inputs | |
CUSTOMER_ID|FIRST_ENTERED_SALES_FUNNEL_DATE|FIRST_CONVERTED_DATE|DAYS_TO_WON|ENTERED_SALES_FUNNEL|CONVERTED | |
Explain each CTE | |
variables: Set Forecast Start/End Date, Perc of Sales Funnel to Exclude when calculating win rate, number of most recent days to take average of to estimate future qualified leads | |
NOTE: whatever FORECAST_START_DATE you set, the model will ignore all data that would not have been known by that date. (no cheating) | |
current companies: inputs from your business. Table with rows: CUSTOMER_ID|FIRST_ENTERED_SALES_FUNNEL_DATE|FIRST_CONVERTED_DATE|DAYS_TO_WON|ENTERED_SALES_FUNNEL|CONVERTED | |
date_dim: bq specific date series | |
daily deals created: establish how many additions to top funnel happen each day - add zeroes for days that do not have data | |
-----intermediate steps to establishing smoothed cumulative win curve----- | |
win_curve | |
current_smoothed_curve_prep | |
current_smoothed_curve | |
-------------------------------------------------------------------- | |
current_forecast_periods: creating cumulative win curve from intermediate steps | |
days_to_exclude: takes % (from VARIABLE: perc_last_salesfunnel_exclude) to calculate the # of days to exclude from recent data when calculating Win Rate Assumption | |
current_win_rate: outputs ONE number -- overall Win Rate | |
daily_top_funnel_avg: outputs ONE number: average of most recent X days (depending on VARIABLE: num_days_to_est_top_funnel) | |
daily_top_funnel_est: takes number from daily_top_funnel_avg and assigns it to all days in forecast period | |
current_known_company_forecast: forecasting probability for each customer that exists in pipeline (as of beginning of forecast period) | |
current_unknown_company_forecast: forecasting probability for future pipeline customers (as of beginning of forecast period) | |
current_known_company_daily_forecast: aggregating known pipeline into rows per day | |
current_unknown_company_daily_forecast: aggregating future pipeline deals into row per day | |
current_daily_forecast: combining known and future pipeline into value per day | |
current_known_company_monthly_forecast: aggregating known pipeline into rows per month | |
current_unknown_company_monthly_forecastaggregating future pipeline deals into row per month | |
current_monthly_forecast: combining known and future pipeline into value per day | |
final_monthly: expected leads to have been won broken out by the month they are expected to be won in, and the month in which they were added to the sales funnel | |
final_daily_forecast_vs_actual: daily forecast vs actuals | |
*/ | |
--------**********--------- | |
with | |
variables as ( | |
--enter your custom variables here | |
select | |
cast('2021-06-01' as date) as forecast_start_date , | |
cast('2021-12-30' as date) as forecast_end_date , | |
.95 as perc_last_salesfunnel_exclude, -- usually this would be set to something like .99, but as we've only had about 1 full sales funnel, i bumped it | |
92 as num_days_to_est_top_funnel -- this X day period will calculate the avg top funnel activities per day and use that to estimate the rest | |
), | |
--select * from variables | |
current_companies as ( | |
select | |
customer_id as customer_id, | |
entered_sales_funnel as deal_created, | |
first_entered_sales_funnel_date as first_entered_sales_funnel_date, | |
case when first_converted_date < forecast_start_date then converted else 0 end as converted, | |
--we do not know that these companies will win if they are before the forecast_start_date | |
first_converted_date as first_converted_date, | |
days_to_won, | |
variables.* | |
from {{analytics.sales_forecast.company_simple_funnel_demo}} | |
join variables on 1=1 | |
where | |
first_entered_sales_funnel_date < forecast_start_date | |
-- ^ only taking companies created before beginning of forecast period | |
), | |
date_dim as ( | |
--BIGQUERY SYNTAX FOR GENERATING DATE DIMENSION TABLE | |
SELECT | |
day as date | |
, date_trunc(day, month) as month | |
, date_trunc(day, year) as year | |
, EXTRACT(YEAR FROM day) as year_int | |
, EXTRACT(MONTH FROM day) as month_int | |
, EXTRACT(day FROM day) as day_int | |
, EXTRACT(DAYOFWEEK FROM day) as day_of_week_int, | |
CONCAT(CAST(EXTRACT(YEAR FROM day) AS STRING), "-Q", CEIL(EXTRACT(MONTH FROM day) / 3)) AS quarter, | |
CONCAT(CAST(EXTRACT(YEAR FROM day) AS STRING), "-T", CEIL(EXTRACT(MONTH FROM day) / 4)) AS tertial | |
FROM UNNEST( | |
GENERATE_DATE_ARRAY(DATE('2000-01-01'), '2099-12-31')) AS day | |
), | |
actuals_conversions as ( | |
-- will use this to join to forecast at the end | |
select date, sum(coalesce(converted,0)) as num_conversions_actual | |
from {{analytics.sales_forecast.company_simple_funnel_demo}} | |
right join date_dim on first_converted_date = date_dim.date | |
group by 1 | |
), | |
actuals_top_funnel as ( | |
-- will use this to join to forecast at the end | |
select date, sum(coalesce(entered_sales_funnel,0)) as num_deals_created_actual | |
from `analytics.sales_forecast__company_simple_funnel_demo` | |
right join date_dim on first_entered_sales_funnel_date = date_dim.date | |
group by 1 | |
), | |
daily_deals_created as ( | |
select dim_day, sum(coalesce(deal_created,0)) deals_created | |
from current_companies | |
right join | |
(select | |
date as dim_day | |
from date_dim | |
where date between (select min(first_entered_sales_funnel_date) from current_companies) and current_date + interval 1 year) | |
on first_entered_sales_funnel_date = dim_day | |
group by 1 | |
order by 1 | |
), | |
win_curve as ( | |
select | |
days_to_won, | |
(sum(converted) over (order by days_to_won) / sum(converted) over ()) as perc_won_by | |
from | |
current_companies | |
where | |
(case when first_converted_date >= forecast_start_date then 0 else converted end) = 1 | |
-- removing deals won after forecast start from numerator but not denominator | |
order by | |
days_to_won | |
), | |
-- creating smooth win curve to create value for every day | |
current_smoothed_curve_prep as ( | |
SELECT | |
*, | |
min(days_to_won) over (order by days desc) as next_known_day, | |
max(days_to_won) over (order by days) as last_known_day, | |
min(perc_won_by) over (order by days desc) as next_known_perc, | |
max(perc_won_by) over (order by days) as last_known_perc, | |
min(days_to_won) over (order by days desc) - coalesce(max(days_to_won) over (order by days),0) known_day_diff | |
FROM | |
UNNEST(GENERATE_ARRAY(1, (select max(days_to_won) from win_curve))) days -- BQ specific method for generating column of integers | |
left join win_curve on days = days_to_won | |
order by | |
days | |
), | |
current_smoothed_curve as | |
--lots of window functions, this fills in values for every day by doing some arithmetic on window functions from _prep step | |
( | |
select | |
*, | |
coalesce( | |
perc_won_by, | |
(((coalesce(days - last_known_day, days) * (next_known_perc - coalesce(last_known_perc, 0))) / known_day_diff) + coalesce(last_known_perc, 0)) | |
) as smoothed_perc_won_by | |
from | |
current_smoothed_curve_prep | |
), | |
current_smoothed_win_curve as ( | |
select distinct days, smoothed_perc_won_by | |
from current_smoothed_curve | |
--outputs for each day, an estimated % of total deals that have been won by that day since deal creation | |
), | |
current_forecast_periods as ( | |
SELECT | |
days, | |
smoothed_perc_won_by, | |
(coalesce(smoothed_perc_won_by - lag(smoothed_perc_won_by) over (order by days),smoothed_perc_won_by)) as perc_won_discrete_forecast | |
from current_smoothed_win_curve | |
order by days | |
), | |
days_to_exclude as ( | |
--this function returns one number, which will give us a bound on calculating the win rate on our upper funnel numbers | |
select min(days) as exclude_range | |
from current_smoothed_win_curve | |
join variables on 1=1 | |
where smoothed_perc_won_by > perc_last_salesfunnel_exclude | |
), | |
--select * from days_to_exclude | |
current_win_rate as ( | |
select | |
sum(converted) / sum(deal_created) as win_rate -- could easily overwrite this line to some static number | |
from current_companies | |
join days_to_exclude on 1=1 | |
where | |
first_entered_sales_funnel_date < least(forecast_start_date, current_date) - exclude_range | |
and coalesce(first_converted_date, forecast_start_date-1) < forecast_start_date | |
--need to calculate win rate such that you don't count days in the excluded period against the overall win % | |
), | |
daily_top_funnel_avg as ( | |
select avg(deals_created) as avg_deals_created_per_day | |
from daily_deals_created | |
left join variables on 1=1 | |
where dim_day between least(forecast_start_date,current_date) - num_days_to_est_top_funnel and least(forecast_start_date,current_date) - 1 | |
-- ^ only taking relevant dates | |
), | |
daily_top_funnel_est as ( | |
select dim_day, | |
avg_deals_created_per_day as deals_created_est | |
from daily_deals_created | |
left join daily_top_funnel_avg on 1=1 | |
left join variables on 1=1 | |
where dim_day between least(forecast_start_date,current_date) and forecast_end_date | |
-- using the estimates from before to forecast future dates in forecast period | |
), | |
current_known_company_forecast as ( | |
select | |
customer_id, | |
first_entered_sales_funnel_date, | |
date_add(first_entered_sales_funnel_date, interval fp.days day) as forecast_period_date, | |
--adding the number of days to each period | |
perc_won_discrete_forecast * wr.win_rate as perc_won_discrete_forecast | |
--^multiplying by win rate | |
from current_companies | |
join current_forecast_periods fp on 1=1 | |
join current_win_rate wr on 1=1 | |
where converted = 0 | |
--don't want to forecast more wins from already won companies | |
), | |
current_unknown_company_forecast as ( | |
select | |
dim_day, | |
date_add(dim_day, interval fp.days day) as forecast_period_date, | |
deals_created_est * (perc_won_discrete_forecast * wr.win_rate) as perc_won_discrete_forecast | |
--^multiplying by win rate AND future deal created est | |
from daily_top_funnel_est | |
join current_forecast_periods fp on 1=1 | |
join current_win_rate wr on 1=1 | |
order by 1,2 | |
), | |
current_known_company_daily_forecast as ( | |
-- for each month, the forecast for each company | |
select | |
date_trunc(forecast_period_date, day) forecast_day, | |
date_trunc(first_entered_sales_funnel_date, day) cohort_day, | |
round(sum(perc_won_discrete_forecast),3) expected_current_deals_won | |
from | |
current_known_company_forecast | |
join variables on 1=1 | |
where | |
forecast_period_date between forecast_start_date and forecast_end_date | |
group by | |
1 ,2 | |
order by | |
1 desc, 2 | |
), | |
current_unknown_company_daily_forecast as ( | |
-- forecast for each estimated additional future deal | |
select | |
date_trunc(forecast_period_date, day) forecast_day, | |
date_trunc(dim_day, day) cohort_day, | |
round(sum(perc_won_discrete_forecast),3) expected_future_deals_won | |
from | |
current_unknown_company_forecast | |
join variables on 1=1 | |
where | |
forecast_period_date between forecast_start_date and forecast_end_date | |
group by | |
1 ,2 | |
order by | |
1 desc, 2 | |
), | |
current_daily_forecast as | |
( | |
select | |
coalesce(fp.forecast_day, cp.forecast_day) forecast_day, | |
-- coalesce(fp.cohort_day, cp.cohort_day) cohort_day, -- do not add cohort months to daily forecast | |
sum(cp.expected_current_deals_won) as expected_current_deals_won, | |
sum(fp.expected_future_deals_won) as expected_future_deals_won, | |
sum(coalesce(cp.expected_current_deals_won,0) + coalesce(fp.expected_future_deals_won,0)) as total_expected_deals_won | |
from current_known_company_daily_forecast cp | |
full outer join current_unknown_company_daily_forecast fp | |
on fp.forecast_day = cp.forecast_day and fp.cohort_day=cp.cohort_day | |
group by 1 | |
order by 1,2 | |
), | |
current_known_company_monthly_forecast as ( | |
-- for each month, the forecast for each company | |
select | |
date_trunc(forecast_period_date, month) forecast_month, | |
date_trunc(first_entered_sales_funnel_date, month) cohort_month, | |
(sum(perc_won_discrete_forecast)) expected_current_deals_won | |
from | |
current_known_company_forecast | |
join variables on 1=1 | |
where | |
forecast_period_date between forecast_start_date and forecast_end_date | |
group by | |
1 ,2 | |
order by | |
1 desc, 2 | |
), | |
current_unknown_company_monthly_forecast as ( | |
-- forecast for each estimated additional future deal | |
select | |
date_trunc(forecast_period_date, month) forecast_month, | |
date_trunc(dim_day, month) cohort_month, | |
(sum(perc_won_discrete_forecast)) expected_future_deals_won | |
from | |
current_unknown_company_forecast | |
join variables on 1=1 | |
where | |
forecast_period_date between forecast_start_date and forecast_end_date | |
group by | |
1 ,2 | |
--order by | |
-- 1 desc, 2 | |
), | |
-- select * | |
-- from current_unknown_company_monthly_forecast | |
current_monthly_forecast as | |
( | |
select | |
coalesce(fp.forecast_month, cp.forecast_month) forecast_month, | |
coalesce(fp.cohort_month, cp.cohort_month) cohort_month, | |
cp.expected_current_deals_won, | |
--cp.expected_current_deals_won_cum, | |
fp.expected_future_deals_won, | |
--fp.expected_future_deals_won_cum, | |
coalesce(cp.expected_current_deals_won,0) + coalesce(fp.expected_future_deals_won,0) as total_expected_deals_won | |
from current_known_company_monthly_forecast cp | |
full outer join current_unknown_company_monthly_forecast fp | |
on fp.forecast_month = cp.forecast_month and fp.cohort_month=cp.cohort_month | |
order by 1,2 | |
), | |
final_monthly as ( | |
select | |
cmf.*, | |
round(sum(total_expected_deals_won) over (order by forecast_month, cohort_month),3) as total_expected_deals_won_cum, | |
round(win_rate,3) as estimated_win_rate, | |
round(avg_deals_created_per_day,3) as estimated_avg_deals_created_per_day, | |
exclude_range | |
from current_monthly_forecast cmf | |
join current_win_rate wr on 1=1 | |
join daily_top_funnel_avg tf on 1=1 | |
join days_to_exclude de on 1=1 | |
order by 1,2 | |
), | |
final_daily_forecast_vs_actual as | |
( | |
select actuals_conversions.date, | |
round(sum(total_expected_deals_won) over (order by forecast_day),3) as total_expected_deals_won_cum, | |
case when forecast_day >= CURRENT_DATE() then null else round(sum(num_conversions_actual) over (order by forecast_day),3) end as total_actual_deals_won_cum | |
from current_daily_forecast | |
left join variables on 1=1 | |
left join actuals_conversions on forecast_day = date | |
) | |
-------------**********-------------- | |
/* | |
End of CTEs for modelling. Expain how to call the model with their data using one of the select *'s | |
select * from final_monthly --OUTPUTS MONTHLY FORECAST WITH ASSUMPTIONS INCLUDED AS COLUMNS | |
select * from final_daily_forecast_vs_actual --OUTPUTS DAILY FORECAST WITH ESTIMATES VS ACTUALS | |
## | |
select *, | |
round(sum(total_expected_deals_won) over (order by forecast_day),3) as total_expected_deals_won_cum | |
from current_daily_forecast | |
## OUTPUTS DAILY FORECAST | |
select * from current_forecast_periods -- Outputs Cumulative Distribution Function | |
## | |
select date, | |
case when date < forecast_start_date then num_deals_created_actual else null end as deals_created_previous, | |
case when date >= forecast_start_date and date <= CURRENT_DATE() then num_deals_created_actual else null end as deals_created_forecast_period, | |
case when date >= forecast_start_date then avg_deals_created_per_day else null end as est_avg_deals_created_per_day | |
from top_funnel_actuals | |
left join daily_top_funnel_avg on 1=1 | |
left join variables on 1=1 | |
where date between forecast_start_date - num_days_to_est_top_funnel and forecast_end_date | |
## OUTPUTS TOP FUNNEL ASSUMPTIONS / ACTUALS over forecast period (and pre-forecast period) | |
Join us in slack: https://join.slack.com/t/weldcommunity/shared_invite/zt-19b372339-1m9ftp0YW2AU8ToZpBViAA | |
*/ | |
select * from final_monthly | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment