Traffic Acquisition GA4 report in BigQuery [2024 Guide]

Table of Contents

We will focus on creating a BigQuery traffic acquisition report similar to the one in GA4.

Traffic Acquisition GA$

Moving to BigQuery is the first steps to having more control over the data you collect. But once the data is there, you still need to query the exported tables to get the metrics and dimensions you want.

How to enable Bigquery Export

If you did not enable the GA4 export to BigQuery, check to this guide to get started.

🔗 Enable GA4-BigQuery Export

This standard report very useful to understand where your traffic is coming from and dig deeper into each channel.

Traffic acquisition query structure

We are going to break this query to 3 stages:

  1. We are going to unnest the basic metrics that are available at the event level.
  2. Next, we work on creating a custom channel group like the one you see in the table below the chart in acquisition report
Traffic Acquisition Channels

What unnesting means?

GA4 is an event-based tool. When we export data to BigQuery, each event has many nested rows containing data, such as which page the event was triggered on. To access this data, you need to extract the specific field (e.g., the page location) from the nested row.

Example of a nested row.

In the last step, we will calculate all the session metrics before scheduling our query.

Step #1: defining the query fields

  • Dimensions

By looking at the fields in the GA4 acquisition report, we can get an idea of what fields we need to construct the query.

Traffic Acquisition Dimensions

Here are some of the most important dimensions we will use in our query:

DimensionDescription
Default channel groupHow GA4 groups traffic from different channels: social organic, paid search, and so on.
Session mediumBroad definition of traffic coming from a source: paid, organic, email…
Session sourceThe name of the platform or website where traffic is coming from: Google, Facebook, LinkedIn…
Session campaignThe campaign name you added to the UTM parameter
Key eventThe name of the conversion that occurred on your website, if you defined one: purchase, sign-up, download…

We will extract other fields related to geography (city, region, etc.) and device information, but they are not as important as the dimensions we will use for attributing each session.

  • Metrics

For the metrics, they are going to revolve around sessions, similar to how the report is structured in GA4.

Traffic Acquisition Metrics

With that being said, here is a list of the metrics we will recreate using our query.

MetricDescription
SessionsThe count of visits (sessions).
Engaged SessionsSessions that meet GA4’s criteria to be considered as engaged:
“An engaged session is a session that lasts longer than 10 seconds, has a key event, or has at least 2 page views or screen views.”
Engagement rateWhen you divide the engaged sessions by total sessions, you get the engagement rate.
Average engagement time per sessionAverage session duration is calculated by dividing total engagement time by the number of sessions.
Sessions Key event rateThe old gold conversion rate, now rebranded by GA4, is calculated by dividing key events (conversions) by total sessions to get the average conversion rate per session.

Step #2: extracting the query fields

WITH flat_table AS (
  SELECT
    parse_date('%Y%m%d', event_date) AS date,
    event_name,
    event_timestamp,
    -- 1️⃣ Define conversions 
    CASE WHEN event_name IN ('file_download', 'invitee_meeting_scheduled') THEN event_name END AS key_event,
    -- 2️⃣ Get session ID, define sessions, and engaged sessions
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key='ga_session_id') AS session_id,
    CONCAT((SELECT value.int_value FROM UNNEST(event_params) WHERE key='ga_session_id'), user_pseudo_id) AS unique_sessions,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key='session_engaged') AS session_engaged,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key='engagement_time_msec') AS   engagement_time_msec,
    -- 3️⃣ Obtain source, medium, and campaign for accurate session attribution.
    traffic_source.name AS campaign,
    traffic_source.source AS source,
    traffic_source.medium AS medium,
    geo.country,
    geo.city,
    geo.region,
    device.category,
    device.operating_system,
    device.mobile_os_hardware_model,
  FROM `<project_id>.events_*` -- Replace <project_id> with your project ID
  -- 4️⃣ Dynamic date range for data retrieval using a 6-month window.
  WHERE _table_suffix BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH))
                           AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
)
SQL

The query may be lengthy, but it’s quite straightforward. You can understand my approach by reviewing the comments in sections 1️⃣, 2️⃣, and so forth.

  • 1️⃣ We first defined the conversions manually, as our GA4 exported table lacks a conversions field.
  • 2️⃣ Next, we extracted fields like session ID.
  • We defined a session by combining the user pseudo ID and session ID
  • We extracted a parameter session_engaged that we will use in subsequent queries to define an engaged session.

ℹ️ Formula to unnest an event parameter

Use this to extract an integer value for a given parameter.

(SELECT value.int_value FROM UNNEST(event_params) WHERE key='event_key') AS any_name_here

Alternatively, you can use a string value instead of in_value to retrieve the text if a string value exists.

3️⃣ In the 3rd step, we extracted some fields to attribute traffic, namely source, medium, and campaign; we will use the source and medium later on to define channel groups.

Step #3: basic metrics calculation and channel grouping

We will use yet another common table expression (CTE) as an intermediate step where we calculate some basic metrics using the fields flat_table.

basic_metrics as(
select 
    date,
    source,medium,campaign,
  -- Conver engagement time to seconds
  engagement_time_msec/1000 as engagement_time,
  key_event,
  -- 1️⃣ Define engaged sessions
  CASE WHEN session_engaged = '1' THEN session_id END AS engaged_session,
  unique_sessions
   -- Country and device dimensions
    country, city, region, category, operating_system, mobile_os_hardware_model,

  -- 2️⃣ Custom channnel group definition
 case
        when (source = 'direct' or source is null) 
            and (regexp_contains(medium, r'^(\(not set\)|\(none\))$') or medium is null) 
            then 'Direct'
        when regexp_contains(campaign, r'^(.*shop.*)$') 
            and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 
            then 'Shopping Paid'
        when regexp_contains(source, r'^(google|bing)$') 
            and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 
            then 'search_paid'
        when regexp_contains(source, r'^(twitter|facebook|fb|instagram|ig|linkedin|pinterest)$')
            and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*|social_paid)$') 
            then 'Social Paid'
        when regexp_contains(source, r'^(youtube)$')
            and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 
            then 'Video Paid'
        when regexp_contains(medium, r'^(display|banner|expandable|interstitial|cpm)$') 
            then 'Display'
        when regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 
            then 'Other Paid'
        when regexp_contains(medium, r'^(.*shop.*)$') 
            then 'Shopping Organic'
        when regexp_contains(source, r'^.*(twitter|t\.co|facebook|instagram|linkedin|lnkd\.in|pinterest).*') 
            or regexp_contains(medium, r'^(social|social_advertising|social-advertising|social_network|social-network|social_media|social-media|sm|social-unpaid|social_unpaid)$') 
            then 'Social Organic'
        when regexp_contains(medium, r'^(.*video.*)$') 
            then 'video_organic'
        when regexp_contains(source, r'^(google|bing|yahoo|baidu|duckduckgo|yandex|ask)$') 
            or medium = 'organic'
            then 'Search Organic'
        when regexp_contains(source, r'^(email|mail|e-mail|e_mail|e mail|mail\.google\.com)$') 
            or regexp_contains(medium, r'^(email|mail|e-mail|e_mail|e mail)$') 
            then 'Email'
        when regexp_contains(medium, r'^(affiliate|affiliates)$') 
            then 'Affiliate'
        when medium = 'referral'
            then 'referral'
        when medium = 'audio' 
            then 'Audio'
        when medium = 'sms'
            then 'SMS'
        when ends_with(medium, 'push')
            or regexp_contains(medium, r'.*(mobile|notification).*') 
            then 'Mobile Push'
        else 'Other'
    end as custom_channel_group,
   
from flat_table)
SQL

In the basic_metrics, we first defined what is an enaged session at 1️⃣. This was done useing a field that returns 1 when a session was labled as enagegd. Next, we went straight to the definition of the custom channel groups at 2️⃣. We used case statements to define each channel.

ℹ️ Case statements

Case statements are very similar to an if statement in Excel; the operating principle is the same

if a value matches the confdition then do X.

Step #4: final calculations of Traffic acquisition metrics

In the final step, we will levarge the metrics and diemnsion we created in basic_metrics.

select 
    date,
    -- 1️⃣  Make attribution session-level
    max(source) as source,
    max(medium) as medium,
    max(campaign) as  campaign,
    max(custom_channel_group) as channel_group,
    country, 
    city, 
    region, 
    category, 
    operating_system, 
    mobile_os_hardware_model,
    key_event as conversion,
    -- 2️⃣ Calculate session metrics
    round(safe_divide(count(key_event),count(unique_sessions)),2) as conversion_rate, 
    count(key_event) as conversions,
    round(safe_divide(sum(engagement_time),count(unique_sessions)),2) as engagement_time_per_session,
    round(safe_divide(count(engaged_session),count(unique_sessions)),2) as enagement_rate,
    count(distinct unique_sessions) as sessions, 
    count(distinct engaged_session) as engaged_sessions,
from basic_metrics 
group by date,country, city, region, category, operating_system, mobile_os_hardware_model,conversion--,channel_group
order by date desc
SQL

In the last query, we use the max() function to make source, medium, and campaign session-based (at 1️⃣ ). A returning user might come from different sources over a month, so we only want to capture the most recent channel they used. This way, we avoid having different channel values for the same date.

At 2️⃣ , we calculated the conversion rate by dividing the count of key events by the number of sessions. the use of safe_divide() is a good practise to not get an error when dividing by 0.

ℹ️ safe_divide()

Takes 2 arguments:
• numerator: The value you want to divide.
• denominator: The value by which you want to divide the numerator.

SAFE_DIVIDE(numerator, denominator)

Step #5: test and compare traffic acquisition reports

We will run the query, then explore it in Looker Studio. This will let us compare data over a specific time range and check if the numbers are close to GA4.

When i compare the data for a 30 day period here are the results i get:

  • Bigquery Acquisition report – Sesssions: 233

GA4 Acquisition report – Sesssions: 213

We have a difference of about 9% which is not that bad. Of course getting 100% is ideal but unfortunately it can be quite hard to achieve. As long as the difference is acceptable you can move on with your day.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Go Further