You should not be splitting nested event tables of GA4 BigQuery into separate tables (like sessions, transactions and users) just for the sake of simplicity esp. for large datasets (more than 1 million rows). It’s a very bad move that will come back to bite you.
The following are the key reasons why this practice is not recommended:
- Nested structure in GA4 BigQuery exports is critical for maintaining data integrity and context.
- Splitting GA4’s events table can create more problems than it solves.
- Maintaining nested data in BigQuery is a better approach than relying on denormalised structures.
- You could double or triple your storage cost by splitting the events table.
- GA4’s nested structure is future-proof.
- Automated SQL generation via generative AI makes splitting tables obsolete.
#1 Nested structure in GA4 BigQuery exports is critical for maintaining data integrity and context.
GA4’s event data model is designed to maintain relationships between different types of data (event data, event parameters, session data, user data, etc) through nested fields.
Nested fields are useful because of the following reasons:
- They link events with users, sessions, and event parameters.
- They preserve relationships between different event types (e.g., page_view, add_to_cart, purchase).
- They maintain event-level granularity, which is essential for advanced data analysis.
- They maintain user context across multiple sessions and devices.
- They enable tracking event sequences, user journeys, and conversion paths without needing to join multiple tables.
When you split the nested events table into separate tables (like sessions, transactions, and users):
- You lose the direct relationships between events, users, and parameters.
- You break the event sequence context, making it difficult to track how events relate to each other within a session or across sessions.
- You are forced to aggregate data too early, causing a loss of event-level granularity that is essential for advanced pathing and attribution models.
- User context is fragmented across multiple tables when you split the data. Cross-device journeys and multi-session interactions become difficult to analyse.
- Splitting tables remove the nested structure that makes it easy to track event sequences and conversion paths.
#1.1 You lose the direct relationships between events, users, and parameters.
As a result, you will need to recreate these links manually through complex and costly joins, increasing query complexity and the risk of errors.
For example, if a user views a product (page_view), adds it to their cart (add_to_cart), and then completes a purchase (purchase), you won’t be able to easily link these events to the same user without rejoining the split tables.
#1.2 You break the event sequence context, making it difficult to track how events relate to each other within a session or across sessions.
As a result, you lose the ability to analyse conversion paths and funnel progressions without reconstructing the relationships through joins.
For example, tracking a user’s journey from homepage → product page → add to cart → checkout → purchase becomes fragmented when events are stored in separate tables.
#1.3 You are forced to aggregate data too early, causing a loss of event-level granularity that is essential for advanced pathing and attribution models.
As a result, you won’t be able to drill down into specific event details or analyse multi-event sequences without querying the raw events table.
For example, if you want to calculate the time between events (e.g., from ‘add_to_cart’ to ‘purchase)’, you will lose this capability because event timestamps are no longer directly accessible.
#1.4 User context is fragmented across multiple tables when you split the data. Cross-device journeys and multi-session interactions become difficult to analyse.
As a result, you will need to stitch user data manually by rejoining tables, which increases complexity and can introduce gaps in user tracking.
For example, if a user starts a session on a mobile device and completes the purchase on a desktop, it will be difficult to track this cross-device journey without the nested event data.
#1.5 Splitting tables remove the nested structure that makes it easy to track event sequences and conversion paths.
Instead, you will need to join multiple tables to rebuild the path. As a result, path analysis becomes fragmented and expensive. You will need to perform multiple joins to track multi-step funnels and user journeys.
For example, analysing a multi-step funnel (e.g., homepage → product page → add to cart → checkout → purchase) will require multiple joins to track user progression across different tables.
#2 Splitting GA4’s events table can create more problems than it solves.
The appeal of smaller, specialised tables comes from the idea that it is easier to query a table with session-level or user-level data rather than handling nested event data.
However, here is the reality:
- Splitting tables creates fragmented data that needs to be joined back together to perform meaningful analysis. These joins are expensive and complex, especially for large datasets (over 1 million rows).
- BigQuery is specifically optimised for working with nested and repeated fields, meaning keeping your data nested is more performant and cost-efficient over time. Stop this obsession with working with flat tables. You are not in kindergarten :)
With smaller and specialised tables, you might end up writing more complex queries to join data back together, which could be inefficient, especially for large datasets.
#3 Maintaining nested data in BigQuery is a better approach than relying on denormalised structures.
A denormalised structure means you are splitting the nested events table into multiple flat tables (e.g., sessions, transactions, and users) for easier querying.
Denormalising data may seem like a good idea initially, but it comes with long-term scalability and maintenance issues, especially as datasets grow.
As your dataset grows, this approach introduces the following problems:
- Increased storage cost.
- Increased maintenance overhead.
- Data consistency issues.
- Increased query complexity.
- Harder to handle schema changes.
#3.1 Increased storage cost.
Splitting the data into multiple tables often results in data duplication (e.g., user IDs or session IDs stored in both sessions and transactions tables), which results in higher storage costs, esp. for large datasets.
#3.2 Increased maintenance overhead.
Maintaining separate tables requires more data pipelines to ensure that each table is accurately updated.
#3.3 Data consistency issues.
Keeping consistent data across split tables becomes challenging, especially when dealing with real-time updates or incremental loads. There is a risk of inconsistent reporting due to mismatched data between tables.
#3.4 Increased query complexity.
Queries on denormalised structures require more complex joins to connect data across tables, which results in higher query costs and slower performance in BigQuery, especially for large datasets.
#3.5 Harder to handle schema changes.
When your business logic or data model changes, you will need to update multiple tables and data pipelines, which increases development time and makes the data pipeline less agile.
#4 You could double or triple your storage cost by splitting the events table.
Splitting a 1-million-row events table into several tables can double or triple your storage costs. Each query that joins the split tables will consume more processing power.
For example, if your dataset grows to 10 million rows, querying split tables will be significantly more expensive and slower than querying a single events table with optimised filtering.
Splitting data into more tables might mean you are querying more data overall, especially if you are frequently joining these tables back together for analysis.
#5 GA4’s nested structure is future-proof.
The nested structure in GA4 is designed to be flexible and adaptable as new features and analysis capabilities are introduced.
New GA4 BigQuery features can be added without needing to change the underlying data structure.
In contrast, splitting tables into sessions, transactions, and users locks you into a rigid structure that might not be compatible with future GA4 BigQuery features.
#6 Automated SQL generation via generative AI makes splitting tables obsolete.
When you can completely automate SQL generation (based on natural language input), manually writing SQL is no longer an issue, and this eliminates the main reason for splitting tables in GA4 BigQuery for simpler query writing.
The complexity of writing queries for nested structures and repeated fields is no longer a barrier.
ChatGPT can generate efficient queries that leverage BigQuery’s capabilities with nested and repeated fields, potentially making these queries just as simple or even simpler to construct and execute than those for denormalized data.
Any benefits that you get by splitting the events_ tables are outweighed by the downsides, especially in terms of accuracy, flexibility, performance, and cost-efficiency.
Splitting the tables may improve ease of use, but it introduces data processing inefficiencies and limits flexibility for advanced use cases.
And when you can automate SQL generation, manually writing SQL is no longer an issue, and this completely eliminates the main reason for splitting tables in GA4 BigQuery.
If you have to work with flat tables, you might as well start using Excel again.
Why? Because you are destroying the core advantages of BigQuery and turning your advanced analytics setup into a manual spreadsheet process.
What you should be doing instead
>> Embrace Nested Structures – They are designed to handle complex, hierarchical data.
>> Use Views for Aggregations – Instead of splitting tables, use views to create session-level or user-level summaries when needed.
>> Leverage SQL Automation Tools – Tools like GA4 BigQuery Composer can automate complex queries, making it unnecessary to split tables for simplicity.
