- 2 Minutes to read
- DarkLight
Data Sampling
- 2 Minutes to read
- DarkLight
In data analysis, sampling is the practice of analyzing a subset of all data in order to uncover meaningful information in the larger data set.
Although data sampling is a technique used in statistics, you may have experienced it in Google Analytics (Universal and 4) and Google Search Console.
You want to estimate the number of trees in a 100-acre area where the distribution of trees was fairly uniform. To get an accurate representation of the entire 100 acres, you could
- Count the number of trees in 1 acre and multiply by 100, or
- Count the trees in a half an acre and multiply by 200.
Google Analytics
Google Analytics needs to process large amounts of data quickly while maintaining accuracy, which is why it randomly samples a subset of your data.
However, this sample will most likely not be representative and will not reflect the true nature of your data. As a result, there will be uncertainty and disturbance in your reports.
Sampling Thresholds
Default reports are not subject to sampling.
Ad-hoc queries of your data are subject to the following general thresholds for sampling:
- Analytics Standard: 500k sessions at the property level for your date range
- Analytics 360: 100M sessions at the view level for your date range
Queries may include events, custom variables, and custom dimensions and metrics. All other queries have a threshold of 1M
Historical data is limited to up to 14 months (on a rolling basis)
There are certain cardinality limits:
- Daily processed tables: 50k rows limit for Universal Analytics, 75k rows limit for Google Analytics 360
- Multi-day processed tables. 100k rows limit for Universal Analytics, 150k rows limit for Google Analytics 360
In some cases, you may see fewer sessions sampled. This may be due to
- The complexity of your Analytics implementation,
- The use of view filters,
- Query complexity for segmentation, or
- Some combination of all of the above.
While Google Analytics attempts to sample up to the thresholds described above, it's not uncommon to sometimes see slightly fewer sessions returned for an ad-hoc query.
Google Analytics 4
The default reports (under the Reports snapshot tab) are not sampled. You're free to add any secondary dimensions, segments, or filters.
Sampling may occur when you create an advanced analysis, such as cohort analysis, exploration, segment overlap, funnel analysis, etc.
For more information, please see the official Google Analytics documentation.
Data Sampling in Google Search Console
Data Sampling in Google Search Console is a similar case to Google Analytics.
When you group by page and/or query, Google Search Console may omit some data in order to be able to calculate results in as fast as possible using a reasonable amount of computing resources.
To be able to get the exact data, group your data only by date, device, and/or country. This way, the totals (as well as granular records) will match the data you see in your GSC UI.
If you need to see the trends per page and/or query, please keep in mind that the totals won't match up completely.
For more information, see the official Google Search Console documentation.