Combine Google Search Console with GA4 in BigQuery [2024]

Table of Contents

Google Search Console is very useful as it is, but how about taking your analysis to the next level by joining it with GA4 data. There are some requirements for this to work, which we will go through.

Why combine GA4 data with BigQuery?

Even if you linked Google Search Console to your GA4 property, you won’t see this information in the export. You need to take the Google Search Console data to BigQuery using the bulk export feature.

Note

In this article, we are going to jump straight to the query. We assume you already:

1. Linked GA4 to Bigquery

2. Enabled the bulk export to move Google Search Console data to Bigquery.

Query structure

Our goal is to understand what organic keywords brought users to our website. First, we are going to prepare the data from our GA4 event table. In the second part, we will do the same for Google Search Console. Finally, we will visualize the data using Looker studio.

Preparing GA4 Data

We will prepare the GA4 fields that we will join later on with Google Search Console:

FieldTypeDescription
landing_pagestringThe name of the landing page.
session_idstringThe unique ID of the visit by the user.
sourcestringThe source of visitor (LinkedIn, email…)
mediumstringThe support used by the user (organic, CPC, email…)
datedateThe date when the user triggered an event.

with ga4_start as(
SELECT 
-- format date 
  PARSE_DATE('%Y%m%d',event_date) as date,
  -- get unique seesion_id
  concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key='ga_session_id')) as session_id,
  -- Get the landing page
  max(case when (select value.int_value from unnest(event_params) where event_name='page_view' and key='entrances')=1 then (select value.string_value from unnest(event_params) where event_name='page_view' and key='page_location') end) as landing_page,
  max((select value.string_value from unnest(event_params) where key='medium')) as medium,
  max((select value.string_value from unnest(event_params) where key='source')) as source,
 FROM `analygo.analytics_434200232.events_*` 
 -- define dynamic date range
 where _table_suffix between  format_date('%Y%m%d', date_sub(current_date(),interval 1 month)) 
 and format_date('%Y%m%d', date_sub(current_date(),interval 1 day))
 group by 1,2
 -- Query only data from Google organic
 having source='google' and medium='organic'),
SQL

In ga4_start, we selected all traffic from the organic medium and google search engine as the source. We also extracted other fields : date and landing_page which will be used as keys to join with GSC data.

Querying GA4 data

After getting all the fields, it’s time to do some calculations and prepare the final GA4 query.

 --Aggregate data from ga4_start
 ga4_data as(
 select 
 date,
 landing_page,
 count(distinct session_id) as unique_sessions
  from ga4_start group by 1,2),
SQL

Preparing Google Search Console data

Google Search Console export is split into two tables:

1️⃣ searchdata_url_impression: the data is aggregated by the URL (more granular view than 2️⃣).

2️⃣ searchdata_site_impression: GSC data aggregated by the property (no URL data).

We are using the first table because it has the URL data.

First, we need to get the necessary fields from searchdata_url_impression.

FieldTypeDescription
landing_pagestringThe name of the landing page.
querystringThe query of the user.
urlstringThe page URL visited by the user.
countrystringCountry where the user did the search.
datedateThe date when the user landed on the page.
impressionsintThe number of impression of our URL on the Google search result.
clicksintThe number of clicks on the URL from the Google search result page.

The GSC data querying is more straightforward. there are no nested fields. We just need to grab the data directly from the searchdata_url_impression table.

  -- Get search console fields
 gsc_data as (
select 
  data_date as date,
  query,
  -- New fiel to get the branded/non-brnded querues
  case when regexp_contains(query,'(analygo)|(hamza)|(kharraz)') then "branded" else 'non-branded' end as branded_query,
  country,
  url as landing_page,
  -- Calculate average postion
  avg(sum_position) as avg_position, 
  sum(impressions) as impressions,
  sum(clicks) as clicks  
from `analygo.searchconsole_hamzaelkharraz.searchdata_url_impression`
where data_date between  date_sub(current_date(),interval 1 month) 
 and date_sub(current_date(),interval 1 day) group by 1,2,3,4,5 having query is not null) 
SQL

We created a new calculated field, branded_query. The query will look for keywords containing any of the terms: ‘analygo’, ‘hamza’, or ‘kharraz’. if there are any matches, it will return the word ‘Branded’.

Notice that we added the having query is not null to avoid getting rows with empty queries.

⚠️ Make sure to select the same data range from GA4/GSC data

You need to select the same date range to avoid getting incoherent data.

Joining GSC and GA4 data

The final step is very straightforward. We will take the two previous queries and join them, extracting only the fields we want.

-- Combine GSC with Bigquery
select 
  ga4_data.date,
  gsc_data.country,
  ga4_data.landing_page,
  ga4_data.unique_sessions,
  gsc_data.query,
  gsc_data.branded_query,
  gsc_data.impressions,
  gsc_data.clicks,
  gsc_data.avg_position

from ga4_data
left join gsc_data on gsc_data.date=ga4_data.date
and ga4_data.landing_page=gsc_data.landing_page;
SQL

We used the landing_page and date fields as the join keys for our query.

Google Search Console & GA4 table

Visualize BigQuery results in Looker Studio

Let’s create a scheduled query sot that we can use our in query in Looker studio.

Google Search Console & GA4

Now we can use the fields within Looker studio to create all kinds of data visualizations.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Go Further