Alain shares a methodology for discovering and identifying Gophish deployments in the wild. How easy is your Gophish installation to spot?
Gophish is an open source phishing framework created by Jordan Wright that is widely used by both internal security teams and security consultancies to perform phishing awareness exercises. Gophish is one of several phishing frameworks we use at Insomnia Security because it’s an excellent tool that is simple to deploy and easy to use.
When evaluating third party tools such as Gophish a key consideration for us is determining the tools susceptibility to external fingerprinting. This evaluation process is important because a tool with an obvious fingerprint could unnecessarily expose our attack infrastructure.
For the average phishing awareness campaign, it’s probably not a big deal if our phishing infrastructure is discovered by the blue team or security researchers. However, during an exercise where operational security is important we want to know how safe a tool is to use and what precautions we need to take while using it.
Gophish exposes a relatively small external attack surface by default. If we look at the registered route handlers in the source code we can see there are only a few paths that will return responses. Most of these paths result in
404 Not Found responses unless you know the special
RID value that corresponds to a specific recipient in a phishing campaign. The Gophish documentation elaborates on this further:
Note: Landing pages are stored in the database. Gophish generates a unique ID (called the rid parameter) for each recipient in a campaign, and uses this ID to dynamically load the correct landing page. To preview what a landing page will look like, you will need to either use the HTML editor seen below, or launch a test campaign. Simply browsing directly to the Gophish listener without specifying an rid parameter will display a generic 404 page.
Requesting a landing page path without a valid
RID value or requesting any unhandled path such as
/ results in a
404 Not Found response as shown below:
GET / HTTP/1.1 Host: 127.0.0.1 Accept: */* HTTP/1.1 404 Not Found Content-Type: text/plain; charset=utf-8 Vary: Accept-Encoding X-Content-Type-Options: nosniff Date: Thu, 08 Aug 2019 23:09:09 GMT Content-Length: 19 404 page not found
At a first glance this looks like a reasonably generic HTTP response that you could potentially see from any random web server. Let’s go through the individual components of the responses and consider how unique each value is:
- The HTTP response code is
404 Not Foundwhich is generic and uninteresting.
- There’s a HTTP body of
404 page not foundthat has a consistent length of 19 bytes. This might be useful but it seems like something a lot of web servers would return.
- Some optional HTTP headers are specified like
Content-Type: text/plain; charset=utf-8,
Vary: Accept-Encoding. These might be useful if this particular combination of headers is unique, but these headers seem common enough.
If we look through the Gophish source code we can’t find any reference to the
404 page not found string or the optional HTTP headers. If we look upstream to the Go HTTP Package source code we can see the
404 page not found string, the
X-Content-Type-Options header and the
Content-Type headers are hardcoded values. This means these values could be useful for fingerprinting HTTP servers built using the Go HTTP Package but not for fingerprinting Gophish directly.
A quick review of the Go HTTP Package source code and documentation doesn’t reveal much else in the way of unique behaviour. One interesting feature is that the Go HTTP Package supports HTTP/2 by default unless it’s explicitly disabled. This might be useful as a fingerprint because HTTP/2 support is a recent technology that is only supported in newer versions of HTTP libraries and web servers. However, this may also be less useful as a fingerprint because HTTP/2 is not yet universally supported (e.g. if an HTTP/2 compatible server is sitting behind a CDN or reverse proxy that only supports HTTP/1.1).
The only Gophish path that returns any real content without a valid
RID value is
/robots.txt which has a hard-coded response in the source code. This results in a response that looks like the one below:
GET /robots.txt HTTP/1.1 Host: 127.0.0.1 Accept: */* HTTP/1.1 200 OK Vary: Accept-Encoding Date: Thu, 08 Aug 2019 23:12:11 GMT Content-Length: 26 Content-Type: text/plain; charset=utf-8 User-agent: * Disallow: /
This seems like a better fingerprint because it’s something specific to Gophish rather than the Go HTTP Package. However, the response still looks generic and looks like it could be the robots.txt from any random website.
A Methodology For Hunting Gophish Servers
Overall the possible fingerprint values we have for Gophish aren’t looking too promising. We have the
404 page not found string and a couple of HTTP headers that could maybe be used to find web servers served by the Go HTTP Package. If you found a server that matches this fingerprint you could then possibly confirm it was a Gophish server by requesting
/robots.txt and checking if you got the correct static response. If you are on a Blue Team looking at a single web server that you are already suspicious of then maybe this is enough to confirm it’s running Gophish. But what if you are looking at 1,000 web servers or all of the web servers on the Internet? This fingerprint seems like it would be too weak to hunt for Gophish servers when they are hidden in a larger pool of web servers.
In order to check how effective this fingerprint is we first need a large database on HTTP responses to search through. There are many Internet search engine projects like Shodan or Censys that could be used for this purpose. My personal preference is Rapid7’s Project Sonar because it’s easy to grab a bulk dump of all of the data and search through it with my own tools offline.
We start our hunt by downloading the HTTP and HTTPS GET response data from Project Sonar. These datasets contain responses to a HTTP GET request to the
/ for the entire IPv4 address space. The initial filtering process we run is to search for responses that:
- Have an HTTP status code of
404 Not Found
- Have an HTTP body content of
404 page not foundand no other data by checking for a
- Specify the optional
Content-Type: text/plain; charset=utf-8and
This returned just under 100K unique IP addresses in the Project Sonar dataset at the time of searching. Potentially this search could be further improved by ignoring responses that also include other HTTP headers that we know Gophish doesn’t send. For example, we didn’t observe any
Set-Cookie headers being sent by Gophish. However, there is a risk we might miss cases where a reverse proxy in front of a Gophish server is adding headers.
We now have a set of 100K IP web servers that have a fingerprint that is similar to the Go HTTP Package. Next we can request
/robots.txt from all of these web servers and check if the response is the same as the hardcoded response in the Gophish source code. This process reduces the set of possible candidates to a much more manageable 1,181 IP addresses. A quick search through the results show there are some web servers that return headers indicating they are unlikely to be Gophish servers such as
Server: Microsoft-IIS/8.5 and
X-XSS-Protection: 1; mode=block. There are definitely refinements that could be made to this filtering process but 1,181 is a small enough number to not worry about some false positives.
Earlier we learned that Gophish won’t return any landing pages without a valid
RID value. Without a valid
RID value all that we can retrieve from the web server is the
/robots.txt page or a
404 Not Found response. This means there is no direct way to confirm that any of these 1,181 web servers are running Gophish or hosting phishing pages. One way we can check if any of these web servers were likely used for phishing is to search passive DNS databases (e.g. Project Sonar’s Forward DNS Dataset) for the IP address. Phishing pages are largely served from registered domain names (rather than raw IP addresses) that are intended to impersonate legitimate domain names of target organisations or well-known brands. We can review known DNS records for the IP address and check the results for any domains that seem suspicious.
We now have an end to end hunting methodology to go from all web servers on the Internet to a small candidate list of possible GoPhish servers:
- Gophish uses the Go HTTP Package which has a fingerprintable HTTP response – we can find IP addresses hosting web servers with a similar fingerprint.
- Phishing pages are usually served from a domain name – we can retrieve known DNS records for IP addresses we have found.
- The domain names used to host phishing pages often impersonate legitimate domain names – we can check the list of DNS names for suspicious domain names.
After performing a hunt for Gophish using the above methodology we have a raw dataset of candidate GoPhish servers consisting of IP addresses and DNS records. This dataset has been made available for download here.
We know this dataset contains some false positives but a search for “gophish” across the DNS records confirms that it looks we have found at least some real Gophish servers:
Impersonating Well-Known Brands
Most of the obvious phishing domains names in the dataset are impersonating well-known brands such as Microsoft, Amazon, Google, Apple and LinkedIn. Some examples are shown below:
Phishing Simulation Providers
We can find some examples of legitimate phishing simulation services who appear to be running Gophish. For example, EveryCloud are a provider of phishing simulations and have a list of domain names that they provide to clients to be whitelisted. We can see these domain names in our dataset which suggests that EveryCloud are probably users of Gophish:
Internal Phishing Awareness Programs
Several organisations can be identified that are using Gophish to run internal phishing awareness programs. For example, the
184.108.40.206 is associated with several domain names related to United Nations agencies:
Some of these domains host content advising visitors that a phishing simulation is being performed:
Several of the IP addresses and domain names in the dataset can be easily linked to security consultancies who are likely performing phishing simulations, penetration tests or red team exercises. As a professional courtesy this blog post avoids calling out any specific examples but these are easy enough to find with some basic analysis. In most cases the lack of concern for operational security and attribution is probably intentional because it’s not essential for all types of security exercises. Even so, I know I would have a bad day if someone dissected my phishing infrastructure on a blog post in the middle of an exercise.
Some of the giveaways that infrastructure might be operated by a security consultancy include:
Hosting phishing servers on IP addresses or domain names that are directly attributable to the company (e.g. DNS records pointing a sub-domain of your corporate domain name to the Gophish server).
Domain WHOIS information containing real contact details of your employees.
Targeting organisations that are highly unlikely to be targeted by a real threat actor.
Targeting multiple organisations in the same geographic area who are in vastly different industries.
Retrieving RID Values From VirusTotal
One way that we can attempt to retrieve Gophish RID values related to a domain is by searching for the domain on VirusTotal. You can search for a domain and see URLs hosted on the domain that been previously scanned. In the case of Gophish phishing pages this can include the RID value in the URL parameters. For example, if we search VirusTotal for naturesb0unty.com we see there are several undetected URLs containing RID values. This means we can potentially discover and visit phishing pages hosted by Gophish if they are still live as shown below:
The results have demonstrated that it’s possible for anyone to hunt for Gophish servers across the Internet. This is due to a combination of Gophish being built with the Go HTTP Package and the inherent suspicious nature of most phishing domains. Depending on your use case for Gophish this may or may not be something you need to be concerned about:
- Penetration Testers & Red Teamers – This is a good reminder to always understand your tools and to be vigilant about operational security. If you are concerned about your Gophish server being identified then consider obscuring the fingerprint by placing a reverse proxy in front of your phishing server or making changes to the Gophish source code.
- Internal Phishing Awareness Programs & Phishing Simulation Providers – You’re already in a position to whitelist your phishing servers and Blue Teams are often provided with exercise indicators in advance. This means it probably doesn’t matter if someone is able to discover your phishing server.
- Gophish Developers – This isn’t a vulnerability or major issue with Gophish. You shouldn’t feel obligated to make changes to mitigate the operational security choices of your users.