Certificate Transparency Logs

Certificate Transparency(CT) logs are a never ending flow of untapped potential and once tamed can provide interesting statistics, valuable insight and an early warning sign of emerging threats (*kind of). It is something I’ve been meaning to dive into for some time now however have only recently found the time, all be it on a Saturday night.

My primary goal was to explore CT logs, deployment of elastic stack and showcase how these could be setup to explore these logs. Some notable use cases for this could be:

  • Subdomain monitoring for an organisation.

  • Bug bounty - An interesting use case, keeping an eye on subdomains that would fall within a bug bounty program. Get in quick for the goodies.

  • Hunting for phish - This could be used for as an early warning for incoming phishing attempts. Pair it with dnstwist or something alike to do the domain permutation generation.

  • Interesting statistics and insights?

Read on to see the setup of elastic stack, parsing and exploration of Certificate Transparency Logs.

TL;DR

This post explores CT log details, exploring CT log streams, parsing streams and bringing them into elastic stack for exploration.

This post does go through the setup of certstream -> filebeat -> logstash -> elasticsearch -> kibana, an interesting exercise in its own right.

Skip through the primer on how CT logs work if you are familiar with them.

How do Certificate Transparency Logs work?

Certificate Transparency logs are a public append only ledger of certificate registration records. These are cryptographically secure and easily searchable. The primary purpose of CT logs are to ensure the legitimacy of certificate registrations providing a mechanism for auditing and verification.

For certificates issues after April 2018, chrome now requires the certificate to be disclosed via Certificate Transparency Logs.

The graphic below depicts the general flow of certificate registration and how domain owners, certificate authorities, log operators and website visitors interact.

There is one key piece of the puzzle we have not to touch on to understand the complete picture of certificate transparency logs and that’s how they are stored. In short they are Merkle Trees.

Merkle Trees, typically implemented as binary trees, are data structures based on hashed data. Each leaf node is a hash of a block of data with each non-leaf node being a hash of its children.

Dig into them if you are interested, for this post we don’t necessarily need to know the specifics.

Accessing CT Logs

So how do we access these logs? The team behind Cali Dog Security have created CertStream, letting anyone leverage their libraries to interact with the certificate transparency network. This will allow you to consume the CT feed with minimal effort.

Check out Certstream here: https://certstream.calidog.io/ and their python library here https://github.com/CaliDog/certstream-python

We will keep it even simpler by skipping the python part and consuming using websocat https://github.com/vi/websocat. Assuming you have websocat, using the below we will be able to pull from certstream directly.

websocat -t - autoreconnect:wss://certstream.calidog.io/

Elastic Stack Setup

The flow we are looking to setup at a high level is shown above, to do this we will need to get elastic stack up and running.

Elastic setup is straight forward for the most basic setup. This setup isn’t fit for a production environment. Follow the official guide from elastic https://www.elastic.co/guide/en/elastic-stack/current/installing-elastic-stack.html or see the alternate guide linked i the references from DigitalOcean. Commands are below, I’m installing on Ubuntu 22.04.2 LTS.

As its well documented I’ll only provide a brief overview of installation.

Elasticsearch Installation
The following commands will setup some prerequisites we need for the deployment as well as install elasticsearch.

sudo apt install nginx
echo "kibanaadmin:'openssl passwd -apr1'" | sudo tee -a /etc/nginx/htpasswd.users
curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch |sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install elasticsearch
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch

sudo nano /etc/nginx/sites-available/<your_domain>

#add the following to the nginx site file
server {
    listen 80;

    server_name <your_domain>;

    auth_basic "Restricted Access";
    auth_basic_user_file /etc/nginx/htpasswd.users;

    location / {
        proxy_pass http://localhost:5601;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
}

sudo ln -s /etc/nginx/sites-available/your_domain /etc/nginx/sites-enabled/your_domain
sudo systemctl reload nginx

Kibana Installation
The following commands will install kibana.

sudo apt install kibana
sudo systemctl enable kibana
sudo systemctl start kibana

Check your kibana installation by navigating to http://<domain>/ if you want to take this a step further you can setup https and authentication.

Beats Installation

Installation will be similar to the other platforms. When editing the configuration file we will have to modify the input and output types. In our case we we will ingest from stdin as thats where we will be receiving certstream json from. Output will be to logstash for us to pase the data.

sudo apt install filebeat
sudo nano /etc/filebeat/filebeat.yml

#In the filebeat.yml file make changes to the following lines
- type: stdin
   json.keys_under_root: true
   json.add_error_key: true
   json.message_key: log
   tags: ["json"]
   
output.logstash:
    hosts: ["http://localhost:5044"]

setup.ilm.enabled: false
setup.template.enabled: false

Check out this here for reference on filebeat stdin configuration https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-stdin.html '

Logstash Installation

The following command will install log stash and create a configuration file for ingesting certstream data. We will do our parsing and sorting using logstash in this file.

sudo apt install logstash
sudo nano /etc/logstash/conf.d/certstream-input.conf

There are many ways this could have been done including pre-elastic stack using python or even with ingest pipelines within elasticsearch. I thought it could be fun to explore some of the capabilities of logstash so decided to try it there.

I’ve done my primary filtering using regex, the example I’ve provided breaks data out into different indices within elasticsearch. I’ve also written a few filters to catch interesting domains, you can expand on this and adapt as you see fit, I’ve kept it simple for demo purposes.

input {
        beats {
                port => 5044
        }
}

filter {
        json {
                source => "message"
        }
}

output {
        if [data][leaf_cert][subject][CN] =~ /.*\.ml/ or [data][leaf_cert][subject][CN] =~ /.*\.tk/ {
                elasticsearch {
                        hosts => "http://localhost:9200"
                        index => "interesting-domains"
                }
                elasticsearch {
                        hosts => "http://localhost:9200"
                        index => "all-domains"
                }
        }elseif [data][leaf_cert][subject][CN] =~ /^.*[0-9]\./ {
                elasticsearch {
                        hosts => "http://localhost:9200"
                        index => "number-only-domains"
                }
                elasticsearch {
                        hosts => "http://localhost:9200"
                        index => "all-domains"
                }
        }elseif [data][leaf_cert][subject][CN] =~ /xn\-/ {
                elasticsearch {
                        hosts => "http://localhost:9200"
                        index => "punycode-domains"
                }
                elasticsearch {
                        hosts => "http://localhost:9200"
                        index => "all-domains"
                }
        }

        stdout {}
}

Testing the Setup


Run the following in two different terminals the first will start logstash with the config file we have build and the second will pull down data using websocat from certstream and pipe it directly into the filebeat we setup for log ingestion.

sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/<yourconfigfile>.conf 
sudo websocat -t - autoreconnect:wss://certstream.calidog.io | sudo filebeat -c /etc/filebeat/<yourconfigfile>.yml -e

If you now navigate to your kibana console (http://<kibanadomain>/) and go to stack management -> Index Management -> Indices you should see multiple indices setup corresponding to the logstash output setup.

Finalising the setup

With the above setup we can now do a few final things within kibana/elasticsearch. The minimum amount of config you will need to do is create an index pattern.

One these index patters are setup you should be able to navigate to discover under the analytics section of the kibana menu and start exploring data.

One thing to note, you can extend setup further by creating field mappings, Index Lifecycle Policies to age out and archive data at time or size intervals. As this is just a test instance and I’ll only be collecting for a few days I’ve kept it simple and left everything as is.

There is also no need to ingest every field. If i was to set this up again I’d drop the majority of fields to reduce the size of the data set.

Exploring The Data

Now that we’ve been collecting logs for about 3 days we are able to start exploring the data. Throwing together a quick dashboard overview shows us over 13 million records collected, not bad for being deployed on a NUC. These were put together using Kibana’s visualisations or “Lens” feature.


Throughout the duration of collection we were ingesting, on average, 180,000 certificate logs per hour. Sizable but nothing too crazy.

Taking a closer look at cert issuers its no surprise Let’s Encrypt is ahead in certs issues by quite alot. All the other big plays are there as expected.

Remember the logstash configuration we setup previously? we can now use those indices to explore subsets of data that might be interesting.

Looking at the graph below I’ve taken all the punycode domains (https://en.wikipedia.org/wiki/Punycode) seen in cert registrations and broke out some top level domains. An idea could be to pull out top level domains less likely to contain punycode to search for potential phishing sites.


Of course, you can also dive into the data directly building your own searches using KQL (Kibana Query Language). You could write queries for your particular use case, below are all *.apple.com registered certificated.

Previous
Previous

DFIR - Automating End to End Acquisition and Processing

Next
Next

Velociraptor - Hunting for MOVEit IOCs