In this post I am going to go through the code for a relatively simple R Shiny app that will predict Google Ads transactions and cpa (cost per acquistion) for three months plus the current month. There will be a graph and a table of values and four sliders to adjust spend for the current month and the coming three months. You’ll have to experiment with the app to get it to run the way you would like.

For the prediction, I am going to use the caret package (https://cran.r-project.org/web/packages/caret/vignettes/caret.html) and Glmnet (https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html). Caret is a wrapper funtion for a number of statistical packages, Glmnet being one of them.

Glmnet is a package created to fit generalized linear models via penalized maximum likelihood. This can be more accurate because there is a penalty calculation as the number of features (parameters) increases. It fits both linear and logistic regression models, as well as a number of other models, which makes it very flexible.

I want to add that I am a data analyst, not a statistician (although I have studied/continue to study statistics :-)), this is a sample app, and that each business will have its own set of features (parameters) that will work best in a given model. As a matter of fact, the model itself consists of a few lines of code.

Each business has to evaluate a number of models and features/parameters before settling on a model and features/parameters that will give the best prediction. This is why out-of-the-box solutions/platforms generally do not work (if an agency tells you that they have an AI platform for media that does not need modeling and tuning by a data analyst, statistician or data scientist…run!). For this app, I have chosen to use month, Google Ads match type, and ad type (brand/non-brand/other) as parameters/features.

Shiny App Set-up

First you need to declare your global variables and load the packages you need. If you don’t know how to access Google Ads through R, see my post on identifying poor performing Google ads (https://fujoanalytics.com/blog/how-to-use-r-to-identify-poor-performing-google-ads/).

# Set up global options.

# Set the locale, otherwise, 
# collation issues can throw errors.
Sys.setlocale('LC_ALL','C') 

# Use pacman to load libraries to make it easier to download
# if not installed.
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse,
               RAdwords,
               caret,
               lubridate,
               shinythemes,
               DT,
               scales,
               googleAnalyticsR,
               googleAuthR)

# Reduce use of scientific notation.
options(scipen = 2)

# Authenicate access for Google Ads.
google_ads_auth <- doAuth(save = T)

For Google Analytics authentication, you can just use googleAnalyticsR authentication. However, if you do a lot of calls to GA in a short period of time, you will quickly run to the limit of this method. In that case, you can create your own Google Client ID and Client Secret (I give brief instructions on how to do that in my post on multi-channel attribution – https://fujoanalytics.com/blog/creating-a-multi-channel-attribution-r-shiny-app-from-google-analytics-data/).

Creating The UI

The UI is not complicated. It consists of a title, four slider inputs, an action button and two outputs – a graph and a table.

# User Interface ####
ui <- fluidPage(theme = shinytheme("lumen"), # Using a theme - there are many others.

                titlePanel("Google Ads w/Prediction"),

                fluidRow(

                        column(3,

                               h5('Pct Spend Chg (-95% to 300%) From Last Year:'),

                               # Slider for the current month change in spend compared to same time last year.
                               sliderInput('spend_chg_curr', paste0('For ', 
                                                               month.name[lubridate::month(Sys.Date())], ':'),
                                           min = -95,
                                           max = 300,
                                           value = 0,
                                           step = 5,
                                           ticks = TRUE
                               ),

                               # Slider for next month's change in spend, compared to same time last year.
                               sliderInput('spend_chg_1', paste0('For ', 
                                                               month.name[lubridate::month(Sys.Date() + 30)], ':'),
                                           min = -95,
                                           max = 300,
                                           value = 0,
                                           step = 5,
                                           ticks = TRUE
                               ),

                               # Slider for month after next month's change in spend, compared to same time last year.
                               sliderInput('spend_chg_2', paste0('For ', 
                                                               month.name[lubridate::month(Sys.Date() + 60)], ':'),
                                           min = -95,
                                           max = 300,
                                           value = 0,
                                           step = 5,
                                           ticks = TRUE
                               ),

                               # Slider for third month after this month's change in spend, compared to same time last year.
                               sliderInput('spend_chg_3', paste0('For ', 
                                                               month.name[lubridate::month(Sys.Date() + 90)], ':'),
                                           min = -95,
                                           max = 300,
                                           value = 0,
                                           step = 5,
                                           ticks = TRUE
                               ),

                               # Nothing happens till the goButton is pressed.
                               actionButton("goButton", "Calculate")
                        ),

                        column(9,

                               # This is the plot of media performace with prediction.
                               plotOutput("media_plot", width = "100%", height = "500px")

                        )

                ),

                fluidRow(column(12,

                                # This returns a table of campaigns and the predicted performance.
                                div(DT::dataTableOutput("media_table",
                                                        width = "100%",
                                                        height = "auto"),
                                                        style = "font-size:85%")

                )

       )

)

The sliders go from -95% to +300% of spend, compared to last year, and each of the four months (current month + next three months) are independently adjustable. If you have multiple ad accounts, you can add a dropdown for users to select an account and then pass that selection into the server code.

Creating the Server/Retrieving Google Ads and Google Analytics Data

All of the server code is in-between the following lines of code:

server <- shinyServer(function(input, output) {})

I first create a little function just to get the current month with a ‘0’ in front of it, if the month is between 1-9. For example, if today’s date is 2020-03-21, the function would return ’03’. I created this because I couldn’t find any function that could do this for me.

get_curr_month <- function(){

        begin_month <-  ifelse(nchar(month(Sys.Date())) == 1, # If the # digits is 1.
                               paste0('0', month(Sys.Date())), # add a '0', so 3 would be '03'.
                               month(Sys.Date())) 
        return(begin_month)

}

This next function retrieves the data from Google Ads and Google Analytics and joins them together. Follow along with the comments for the explanation as to how the code works.

ad_data <- function(account_id, ga_id) {

        # The begin date is current month, last year.
        begin_date <- as.Date(paste0(lubridate::year(Sys.Date()) - 1, '-', get_curr_month(), '-01'))
        end_date <- Sys.Date() # Current date.

        # Create statement to retrieve data from Google Ads.
        # We're using the KEYWORDS_PERFORMANCE_REPORT from the 
        # Google Ads api.
        body <- statement(select = c( 'Month',
                                      'CampaignId',
                                      'AdGroupId',
                                      'CampaignName',
                                      'AdGroupName',
                                      'Cost',
                                      'KeywordMatchType',
                                      'Conversions'),
                          report = "KEYWORDS_PERFORMANCE_REPORT",
                          start = begin_date,
                          end = end_date)

        # Query Adwords API and load data as dataframe.
        google_ads_data <- getData(clientCustomerId = account_id, 
                            google_auth = google_ads_auth,
                            statement = body)

        # We're going to add month and year columns 
        # to join with GA data. Then group and summarise the data.
        google_ads_data <- google_ads_data %>%
                            dplyr::rename(ads_date = Month,
                                          cost = Cost) %>%
                            dplyr::mutate(ads_date = as.Date(ads_date),
                                          year = lubridate::year(ads_date),
                                          month = lubridate::month(ads_date)) %>%
                            dplyr::group_by(ads_date, 
                                            year,
                                            month,
                                            CampaignID,
                                            AdgroupID,
                                            Adgroup,
                                            Matchtype) %>%
                            dplyr::summarise(cost = sum(cost),
                                             Conversions = sum(Conversions)) %>%
                            dplyr::ungroup()

        # Get ads data from GA - need the following:
        # Date, Campaign, Ad Group, and Transactions.
        # The Conversions in Google Ads may not be correct (often it's not),
        # so we will replace that with transactions from GA.
        ga_data <- google_analytics(ga_id,
                                    date_range = c(begin_date, 
                                                   end_date),
                                    dimensions = c('adwordsCampaignID',
                                                 'adwordsAdGroupID',
                                                 'year',
                                                 'month',
                                                 'adMatchType'), 
                                    metrics = c('transactions'),
                                    anti_sample = TRUE)

        # Rename a couple columns, change datatype of month and year,
        # and set match type.
        ga_data <- ga_data %>% dplyr::rename(CampaignID = adwordsCampaignID,
                                             AdgroupID = adwordsAdGroupID,
                                             Matchtype = adMatchType) %>%
                               dplyr::mutate(month = as.numeric(month),
                                             year = as.numeric(year),
                                             Matchtype = substr(Matchtype, 1, regexpr(' ', Matchtype) - 1))

        # Since the join columns are the same, 
        # we can use a simple statement to join 
        # Google Ads and GA dataframes.
        google_ads_data <- google_ads_data %>% dplyr::inner_join(ga_data)

        # We can drop Conversions, but this can be used
        # to compare to transactions - are they the same?
        google_ads_data$Conversions <- NULL

        # Rename transactions from GA to conversions - less confusing, I think.
        google_ads_data <- google_ads_data %>% 
                                        dplyr::rename(conversions = transactions)

        # Creating a new column for ad group type.
        # In this case, I'm looking for ad groups that start with
        # brand or non-brand (not case-sensitive). This will need
        # to be updated if your ad groups are named differently or
        # there are different types of ad groups you want to incorporate.
        # Then, we group by the month/year and ad group type 
        # and ad group match type.
        keyword_data <- google_ads_data %>%
                dplyr::rename(adgroup = Adgroup,
                              match_type = Matchtype) %>%
                dplyr::mutate(adgroup_type = 
                      dplyr::case_when(
                              startsWith(tolower(adgroup),'brand') > 0 ~ 'brand', 
                              startsWith(tolower(adgroup),'non-brand') > 0 ~ 'non-brand',
                              TRUE ~ 'other')) %>%
                dplyr::group_by(month,
                                year,
                                adgroup_type,
                                match_type) %>%
                dplyr::summarise(cost = sum(cost),
                                 conversions = sum(conversions)) %>%
                ungroup() %>%
                dplyr::mutate(month = as.factor(month),
                              adgroup_type = as.factor(adgroup_type),
                              match_type = as.factor(match_type),
                              new_month = ifelse(nchar(as.character(month)) == 1, 
                                                        paste0('0', month), month),
                              new_date = paste0(year, '-', new_month, '-01')) 

        # After calculating the date for the first day of each month/year,
        # we create a new ads_date column.
        keyword_data$ads_date <- as.Date(keyword_data$new_date)

        # These columns are not needed now 
        # that we have calculated ads_date.
        keyword_data$new_month <- NULL
        keyword_data$new_date <- NULL

        return(keyword_data)

}

And here is the calling function, which is reactive and available to the rest of the app. This is where you can hard-code your Google Ads account number and GA View ID (or pass it from a drop-down in the UI).

get_ad_data <- reactive({

        if(input$goButton == 0){return()}

        # Add Google Ads Account ID 
        # (Not MCC Id, it will not work)
        # and GA View ID. 
        account_id <- 'XXX-XXX-XXXX'
        ga_id <- 'XXXXXXXX'

        # Call ad_data function.
        ad_data <- ad_data(account_id, ga_id)  

        return(ad_data)

})

Ooh, and now for the predictive function. As you can see, the prediction functions themselves (toward the end) are not many lines of code.

predict_conv <- function(input_df){

        # Get the change from the slider inputs.
        month_curr_chg <- isolate(input$spend_chg_curr)
        month_1_chg <- isolate(input$spend_chg_1)
        month_2_chg <- isolate(input$spend_chg_2)
        month_3_chg <- isolate(input$spend_chg_3)

        # Calculate the change, if there is one, for each month. 
        month_curr_chg <- ifelse(month_curr_chg < 0, abs(month_curr_chg) *.01, month_curr_chg *.01 + 1)
        month_1_chg <- ifelse(month_1_chg < 0, abs(month_1_chg) *.01, month_1_chg *.01 + 1)
        month_2_chg <- ifelse(month_2_chg < 0, abs(month_2_chg) *.01, month_2_chg *.01 + 1)
        month_3_chg <- ifelse(month_3_chg < 0, abs(month_3_chg) *.01, month_3_chg *.01 + 1)

        # The date of the first day of the current month.
        cur_date <- as.Date(paste0(year(Sys.Date()), '-', get_curr_month(), '-01'))

        # The month # of the next three months 
        # following the current month (cur_date).
        cur_month <- month(Sys.Date())
        month_1 <- ifelse(cur_month == 12, 1, cur_month + 1)
        month_2 <- ifelse(cur_month == 11, 1, 
                        ifelse(cur_month == 12, 2, cur_month + 2))
        month_3 <- ifelse(cur_month == 10, 1, 
                          ifelse(cur_month == 11, 2, 
                                 ifelse(cur_month == 12, 3, cur_month + 3)))

        # Filter the data for this month # and the next
        # three months. Then remove the current month's data.
        # There will be four months of data from last year.
        keyword_data_forecast <- input_df %>%
                dplyr::filter(month %in% c(cur_month,
                                           month_1,
                                           month_2,
                                           month_3),
                              ads_date != cur_date,
                              ads_date > Sys.Date() - 395)

        # Make the ad_dates in the future (present for this month).        
        keyword_data_forecast$ads_date <- keyword_data_forecast$ads_date + 365

        # Incorporate any cost adjustments.
        keyword_data_forecast <- keyword_data_forecast %>%
                dplyr::mutate(cost = ifelse(month == cur_month, cost * month_curr_chg,
                                       ifelse(month == month_1, cost * month_1_chg,
                                         ifelse(month == month_2, cost * month_2_chg,
                                              ifelse(month == month_3, cost * month_3_chg, cost)))))

        # K-Fold Cross-Validation
        # Define training control
        train.control <- trainControl(method = "cv", number = 10)

        # Train the model.
        k_fold_glmnet_model <- train(conversions ~
                                             month +
                                             adgroup_type +
                                             match_type +
                                             cost,
                                     data = input_df,
                                     method = "glmnet",
                                     trControl = train.control)

        # Predict the outcome w/model.
        keyword_data_forecast$conversions <- 
                predict(object = k_fold_glmnet_model,
                        keyword_data_forecast[ , c( 'month',
                                                    'adgroup_type',
                                                    'match_type',
                                                    'cost')])

        return(keyword_data_forecast)

}

This function will be called by the graphing and table functions, along with the reactive get_ad_data dataframe.

Graphing the Data

The graph will show four trend lines:

  • Actual Transactions
  • Forecast Transactions
  • Actual CPA
  • Forecast Transactions

Since Tranactions and CPA are on different scales, we will need to calculate values on one or the other to move them so they fit together. Then, we need to add a second y-axis label and values. We’ll do that for CPA.

We will also add a vertical dashed line to distinguish ‘Actual’ from ‘Forecast’ and change the colors of each trend line when they cross over the vertical line.

 output$media_plot <- renderPlot({

        # If the goButton has not been pressed, exit function.  
        if(input$goButton == 0){return()}

        # Set the date for the first day of the current month.
        cur_date <- as.Date(paste0(year(Sys.Date()), '-', get_curr_month(), '-01'))

        # Get the reactive dataframe, 
        # filter out the current month,
        # group the rest and sum the cost and conversions
        # and calculate the cpa. 
        # Add the new column,  data_type_of, with the value of 'actual'
        ads_data <- get_ad_data() %>% 
                                dplyr::filter(ads_date != cur_date) %>%
                                dplyr::group_by(ads_date) %>%
                                dplyr::summarise(cost = sum(cost),
                                                 conversions = sum(conversions),
                                                 cpa = sum(cost)/sum(conversions)) %>%
                                dplyr::mutate(data_type_of = 'actual')

        # Create the predicted dataframe for this month and the next three motnhs.
        # call predict_conv using the get_ad_data dataframe as input.
        # Group, summarise and calculate, the same as above.
        # Add the new column,  data_type_of, with the value of 'predicted'
        ads_data_predict <- predict_conv(get_ad_data()) %>% 
                                        dplyr::group_by(ads_date) %>%
                                        dplyr::summarise(cost = sum(cost),
                                                         conversions = sum(conversions),
                                                         cpa = sum(cost)/sum(conversions)) %>%
                                dplyr::mutate(data_type_of = 'predicted')

        # Join the two dataframes together.
        ads_data <- rbind(ads_data, ads_data_predict)

        # Set the axis font, x-format and y-format.
        axis_font <- list(

                family = "Old Standard TT, serif",
                size = 10,
                color = "black"

        )

        x_format <- list(

                title = 'Date',
                tickangle = 45,
                tickfont = axis_font

        )

        y_format <- list(

                title = 'Conversions',
                tickfont = axis_font

        )

        # Separating the prediction data.
        ads_predict <- ads_data %>% filter(data_type_of == 'predicted')

        # This is used so the two lines can occupy the same space - conversions and cpa.
        scale_y <- mean(ads_data$conversions)/mean(ads_data$cpa)

        # These will be used in the labels placement.
        min_date <- min(ads_data$ads_date)
        max_date <- max(ads_data$ads_date)

        # For the labels: 'Actual' & 'Forecast' x-axis positions.
        mid_date_forecast <- (max_date - floor((max_date-cur_date)/2)) - 15
        mid_date_actual <- (cur_date - floor((cur_date-min_date)/2)) - 45

        # Draw the plot using ggplot(). 
        ggplot() +
                geom_line(aes(x = ads_data$ads_date, # This is the actual conversions line.
                                y = ads_data$conversions, group = 1, color = 'green'), size = 1.5) + 
                geom_line(aes(x = ads_predict$ads_date, # This is the predicted conversions line.
                                y = ads_predict$conversions, group = 1), color = '#527a52', size = 1.5) +
                geom_vline(xintercept = cur_date - 15, 
                           linetype = 'dashed', 
                           color = "grey", size = 1.5) + # This adds a dashed grey line 
                                                         # where the actual ends and the prediction begins
                xlab("Month") + ylab("Conversions") +  # x & y axis labels.                                                 
                scale_x_date(date_breaks = "months" , date_labels = "%b-%y")  + # How dates will breakdown.
                geom_line(aes(x = ads_data$ads_date, 
                              y = ads_data$cpa * scale_y, 
                              group = 1, 
                              color = 'blue'), 
                          size = 1.5) + # This is the actual cpa line.
                geom_line(aes(x = ads_predict$ads_date, 
                              y = ads_predict$cpa * scale_y, 
                              group = 1), 
                          color = '#52527a', size = 1.5) + # This is the predicted cpa line.
                scale_y_continuous(sec.axis = sec_axis(~./scale_y, name = "CPA")) + # This adds a second y axis for CPA.   
                scale_color_identity(name = "", # Creating the legend - 
                                                # have to construct since there are four different lines.
                                     breaks = c("green", "blue"),
                                     labels = c("Conversions", "CPA"),
                                     guide = "legend") +
                theme_bw() + theme(legend.position = "bottom", 
                                   legend.text = element_text(size = 14))  + # Setting theme & legend.
                annotate("text", 
                         x = mid_date_actual, 
                         y = Inf, 
                         label = "Actual", 
                         size = 8, 
                         hjust = 0, 
                         vjust = 2) + # Adds the 'Actual' label in the graph.
                annotate("text", 
                         x = mid_date_forecast, 
                         y = Inf, 
                         label = "Forecast", 
                         size = 8, 
                         hjust = 0, 
                         vjust = 2) # Adds the 'Forecast' label in the graph.

        })

Adding the Table

Finally, we’re going to add a table that summarises the data by month.

output$media_table <- DT::renderDataTable({

        # If the goButton has not been pressed, exit function. 
        if(input$goButton == 0){return()}

        # Set the date for the first day of the current month.
        cur_date <- as.Date(paste0(year(Sys.Date()), '-', get_curr_month(), '-01'))

        # Get the reactive dataframe, 
        # filter out the current month,
        # and calculate the cpa. 
        # Add the new column,  data_type_of, with the value of 'actual' 
        ads_data <- get_ad_data() %>% 
                dplyr::filter(ads_date != cur_date) %>%
                dplyr::mutate(type_of_data = 'actual',
                              cpa = cost/conversions)

        # Create the predicted dataframe for this month and the next three motnhs.
        # call predict_conv using the get_ad_data dataframe as input.
        # Add the new column,  data_type_of, with the value of 'predicted'
        ads_data_predict <- predict_conv(get_ad_data()) %>% 
                dplyr::mutate(type_of_data = 'predicted',
                              cpa = cost/conversions)

        # Join the two dataframes together.
        ads_data <- rbind(ads_data, ads_data_predict)

        # Group by month and summarise cost & conversions.
        # Calculate cpa as well.
        ads_data <- ads_data %>%
                dplyr::mutate(conversions = ifelse(conversions < 0, 0, conversions),
                              cpa = ifelse(conversions <= 0, 0, cpa)) %>%
                dplyr::group_by(ads_date) %>%
                dplyr::summarise(cost = sum(cost),
                                 conversions = sum(conversions),
                                 cpa = sum(cost)/sum(conversions)) %>%
                dplyr::ungroup()

        # Finally, return the data table.
        ads_data %>% select(ads_date, cost, conversions, cpa) %>%
        datatable(., 
                  selection = 'multiple', 
                  options = list(searching  = FALSE,
                                 paging = FALSE,
                                 order = list(list(1, 'desc'), list(2, 'desc'))),
                  caption = 'Google Ads Detail') %>%
                  formatRound(c('conversions'), 
                              digits = 0, 
                              interval = 3, 
                              mark = ",", 
                              dec.mark = getOption("OutDec")) 


})

Don’t forget to close the shinyServer function. To call the app, just add the following at the end:

shinyApp(ui = ui, server = server)

Conclusion

I hope that this post will give you some ideas as to how to incorporate predictive analytics into your analysis. This app is really meant to be a starting point (as it has been for me) to get you going. I would recommend learning more about modeling in general and playing with both machine learning models as well as simple linear regression models to find the best predictive models for your business.

In this app I used only three parameters/features, but it is best to find the ones that make the most sense to you.

Last modified: December 9, 2020