Making RudderStack Ad-Blocker Proof in 66 Lines of Code

By Max Werner, published: 2021-08-26, Updated: 2021-11-13

Person holding chart and bar graph. Photo by Lukas from Pexels

Today we’ll look at how we can make sure that the RudderStack SDK cannot be blocked by ad-blockers or a browser’s privacy feature, whether you go with hosted RudderStack or self-host it. The secret ingredient here either way is: CloudFlare Workers!

What Does the RudderStack’s JavaScript SDK Need?

In order to function the RudderStack JavaScript SDK needs three things:

  • Embed the JS SDK on the page
  • Retrieve the source config from the RudderStack Data Plane
  • Sending AJAX POST requests to your Data Plane URL for identify, track, page, group calls and so on

How to Prevent Ad-Blocking of Your Analytics and Data Collection

Well that’s quite simple, don’t get the SDK from cdn.rudderlabs.com and don’t send requests to *.dataplane.rudderstack.com. We’ll do this by simply proxying the requests through a CloudFlare worker. And if your site is hosted through CloudFlare you can have it run off a subdomain very easily too!

We’ll do this in the following easy steps:

  1. Setup a CloudFlare worker with the CloudFlare worker CLI
  2. Add the code required (all 66 lines of it)
  3. Publish the worker
  4. Attach it to a subdomain (optional)
  5. Configure CloudFlare’s SSL Settings (optional)
  6. Configure our website’s JavaScript implementation to use our URLs

Step 1: Setting up CloudFlare Workers

CloudFlare’s documentation is quite easy to follow and can be found here. The command line steps are:

npm install -g @cloudflare/wrangler
wrangler login
wrangler generate my-rudder-proxy

This will generate a my-rudder-proxy directory containing all the code required for the cloudflare worker setup. Step 2: Add the code required (all 66 lines of it) Opening up the index.js file of the worker we can replace it with the contents of this GitHub Gist. Simply replace the Data Plane URL in line 7 with yours and that’s it.

If you’re interested in why this is so easy, read on, otherwise skip down to the next step.

This works like a charm for two important reasons:

(1) RudderStack lets you configure where it will retrieve the source config from in it’s load method like so

rudderanalytics.load(
RUDDER_WRITE_KEY, 
DATA_PLANE_URL, 
{configUrl: SOME_URL}
);

The RudderStack JS SDK automatically appends ‘/sourceConfig’ to this SOME_URL. As you can see in the code, calling our worker’s ‘/sourceConfig’ endpoint will do just that, and your write-key is base64 encoded in the request as an Authorization header, that’s why we’re simply passing on the request headers! This means the browser calls “your” URL which is the worker, which gets the sourceConfig from the real RudderStack and then gives it back to you. But since the browser’s request goes against your own subdomain, it is not caught in the tracker list browsers and ad-blocker extensions maintain.

(2) RudderStack Requests can go anywhere, that’s what your DATA_PLANE_URL is for, meaning we’ll simply point the DataPlane URL to our worker which will forward the request to the actual RudderStack DataPlane URL.

Step 3: Publish the worker

Open your wrangler.toml file and update it to be this:

name = "rsp"
type = "webpack"

account_id = "YOUR_ACCOUNT_ID"
workers_dev = true

[env.production]
route = "yoursub.domain.com/*"
zone_id = "YOUR_ZONE_ID"

You can retrieve the account_id and zone_id from your cloudflare dashboard. As for the route part simply chose the subdomain you want to run this through. In my case I use rsp.obsessiveanalytics.com (rsp standing for RudderStack Proxy ;)).

That’s it as far as the configuration goes. Simply run wrangler publish --env production and it’ll handle the rest for you.

Step 4: Attach it to a subdomain (optional)

This is quite simple. All you need to do is create a DNS record for the subdomain that resolves to anything. In my case I use rsp as my subdomain so I simply add an AAAA record for it pointing to the IPV6 placeholder of 100:: and an A record to 192.0.2.1. The details for this can be found here but suffice to say, this will work.

Step 5: CloudFlare’s SSL Settings (optional)

This is only required if you want to use your own subdomain. If you want to run the worker through a CloudFlare provided *.workers.dev subdomain, you don’t need this step.

You will get Status Code 525 errors if your SSL settings are not set to Full or Full (strict). So simply set the SSL settings to either and we’re done here.

Screenshot of CloudFlare's ssl settings page

BE CAREFUL

This can break connections to your origin servers for the same domain if you have any. If you have it set to flexible (default) and go to full or full strict, your origin servers need either self-signed or CloudFlare provided SSL certs. If you don’t have this, your origin servers are seen as “down”. This has nothing to do with the workers setup but with other servers you might have serving things for the same domain. If you don’t know what this means, skip step 4 and 5 and just use the .workers.dev domain. It’ll work although it’ll be less “clean” as ad-blockers might include such public domains at some point in the future!

BE CAREFUL

Step 6: Configure our website’s JavaScript implementation to use our URLs

Thankfully this part is quite easy too. For the purposes of this article we’ll assume that your worker’s subdomain is foo.bar.com.

In your website’s code, instead of adding a script tag for https://cdn.rudderlabs.com/v1/rudder-analytics.min.js you add one for https://foo.bar.com/dataPlane

In your RudderStack’s initialization code you change

rudderanalytics.load(
RUDDER_WRITE_KEY, 
DATA_PLANE_URL
);

To

rudderanalytics.load(
RUDDER_WRITE_KEY, 
‘https://foo.bar.com’, 
{configUrl: ‘https://foo.bar.com’}
);

The script tag will ensure that you’re getting the JS SDK code but technically from your own domain, not RudderStack’s. This means that that request isn’t blocked as you’re simply including a script tag from your own domain. What could be more innocuous? ;)

The load() changes will ensure that your data is sent to your own domain (foo.bar.com instead of something.dataplane.rudderstack.com AND the source config is retrieved from https://foo.bar.com/sourceConfig/ as opposed to https://something.dataplane.rudderstack.com/sourceConfig/ respectively. That’s it!