Blog


How to Make a List of Nearly Every Minecraft Player

I’ve recently been engaging in some tomfoolery to acquire a list of 51 million Minecraft: Java Edition player UUIDs (out of ~61 million total existing UUIDs). This blog post will explain exactly what I did to make this list.

Abusing the Mojang API with IPv6

Mojang has an internal API (documented by the community at wiki.vg) which the game uses to convert player usernames to UUIDs and to obtain information about player UUIDs. Mojang also allows anyone to use the API for their own purposes, but with ratelimits (about 10 requests per IP per second). The most obvious way of circumventing the ratelimits is obtaining proxies, but proxies tend to be slow and obtaining many high-quality proxies is costly.

wiki.vg uuid to profile and skin/cape

One solution to this problem is IPv6. Most server hosts will provide you with a /64 subnet (2^64 addresses), so by using a random IPv6 address for each request you can sidestep the ratelimits. There’s an open-source project on GitHub called freebind that describes itself as an “IPv6 address rate limiting evasion tool” which lets you conveniently enable the IP_FREEBIND socket option and randomize the bind address for every socket opened by a program. freebind is great and worked as advertised, but after some testing I noticed that the Mojang API was returning a significant amount of 429 Too Many Requests even though I was using 18 quintillion different IPs. As it turns out, the Mojang API has some per-subnet ratelimiting for IPv6.

I’m not the first person to do this, and after asking around a bit I was informed that you can use Hurricane Electric’s tunnel broker service to get a /48 (2^80 addresses) for free. Hurricane Electric has a bunch of silly things on their website, but the silly thing I’m using here is tunnelbroker.net. After signing up I created my tunnel, assigned it a /48, and used their route2 example configuration to add it to my server.

Hurricane Electric's example configuration page showing some IP commands

This all worked fine, and I was able to hit the Mojang API at approximately 400 requests per second. However this is quite slow when you consider the fact that there’s millions of accounts and many more possible username combinations. The first optimization I did was getting rid of freebind and Rewriting it in Rust™ instead, using raw socket syscalls and a custom Hyper connector. Basically all my custom connector does is create an AF_INET6 socket, set IPV6_FREEBIND on it, bind to a random IPv6 address in our subnet, and connect to the destination IP. At the time there wasn’t any significant speedup, but it did help with an optimization later. The second “optimization” I did was moving the server and the tunnel IPv6 closer together geographically and to the US, where it got significantly better ping so I could do more concurrent requests at a time. I also realized that the Mojang API supports HTTP/2 which allows you to make multiple requests at the same time per stream, so I modified my code to reuse the same HTTP client for every chunk of 10 requests. This helped significantly, making it approximately 6 times faster. Finally, to speed up username lookups, I made my code use Mojang’s bulk username lookup endpoint which allows you to find the UUIDs of 10 usernames per request. Now I’m able to do about 8,000 UUID lookups per second on average (and 80,000 username lookups per second), so it’s time to start actually making use of that speed.

Scraping for UUIDs

I already had a few small UUID lists in my hands. I’d previously made a Minecraft server scanner that logged every player on every server, so by gathering up those UUIDs and usernames and feeding them into my program I got a list of about 5 million UUIDs. Next, I knew about a Hypixel Forums post with 7 million UUIDs that they’d gotten from crawling friends returned by the API, so I checked all of those. Later, I also found a deleted post on the forums with 14.4 million UUIDs, but luckily it was archived by Bing’s cache and the download link was still active.

A screenshot of a post on the Hypixel Forums: I disagree with the removal of the friend endpoint in the API. As a protest, here are 14,419,374 uuids gathered with it. This data was gathered last year. It is alphabetically sorted.

I then made my Mojang API ratelimit evasion tool into a public API for my friends and I to use, and I made it save all valid UUID and username lookups into a SQLite database. It’s hard to estimate how many new UUIDs I got from people using my API, but it’s at least a few thousand.

At this point I had 11.1 million UUIDs, but that wasn’t enough. There’s a website called NameMC which allows you to conveniently look up players and see some basic information about them. It also happens to have a wildcard search for usernames, so for example searching abc* you can see every player whose username starts with abc. NameMC’s existed for a long time, so I figured scraping NameMC’s database by abusing wildcards would be a good way to get a lot of UUIDs. There were a few things that made this harder though:

  1. Wildcard queries must have at least 3 characters
  2. Wildcard searches are limited to 1,000 results
  3. Cloudflare’s “under attack” mode is permanently enabled, so there’s a captcha every few minutes
  4. There’s a ratelimit for searching

To work around the first two issues I made a program that finds every possible username by searching like aaa*, aab*, etc, and adding an extra character if the result returned more than 1,000 usernames. For Cloudflare captchas, I knew they aren’t particularly complex and are solvable with free web scraping libraries. The first library I tried was undetected-chromedriver, but it turns out Cloudflare started being able to detect it. I then looked into puppeteer-stealth, but they were detected too. Fortunately, one of the issues on undetected-chromedriver stated that a package named DrissionPage still worked for clicking Cloudflare captchas. The documentation for DrissionPage was all written in Chinese, but by looking at the examples and some Google Translate I eventually got it working.

A grid of 8 Chrome windows showing a Cloudflare captcha page, with the captchas all being clicked simultaneously

Now, there’s the issue of ratelimits. Of course I could use IPv6 once more, but due to the fact that they were Chrome windows on my computer as opposed to basic HTTP requests done by a server, this would be trickier. I toyed with the idea of using proxies, but after some testing with NameMC I discovered that they check the X-Forwarded-For header for ratelimiting, so by randomizing the value of that header I’d never get ratelimited.

Another grid of 8 Chrome windows, this time showing some wildcard searches

Scraping NameMC took a couple days, and I’m well-aware it’s possible to do this faster but I didn’t mind waiting. When I finished scraping NameMC and checking all of their usernames, I had 31.4 million UUIDs. Later I also learned that there’s another way of scraping their database by passing in negative offsets to their minecraft-names page but I don’t think it would’ve been much faster anyways.

Username stuffing

At this point I have pretty much every player who’s played multiplayer at least a few times, so it didn’t seem like there were many more ways to get new players. I posted my data on archive.org and told some friends about it, including Northernside. Northern is somewhat well-known for messing with Mojang, and as it turns out he’d also been collecting Minecraft UUIDs and currently had 32.8 million of them.

me telling Northernside about my archive.org post, and him replying: thats fun, i got about 32.8 mil rn

He told me he’d also scraped NameMC, and was getting new UUIDs by checking usernames from data breaches and making slight variations to existing usernames. Inspired by Northern’s shenanigans, I also began to download some data breaches from questionable sources and stuffing their usernames into the Mojang API. I got many million more from this, but eventually after checking billions of usernames I started to run out of large data breaches to try. One dump I was interested in trying but couldn’t due to its size was the Reddit archives by Pushshift, but luckily my friend cbax was able to download it and provide me with all the usernames of everyone who’s ever posted on Reddit.

Now, I had to try some more creative methods of brute-forcing usernames. My friend Overlord volunteered to make an AI model to generate names based on my dataset, and checking the several hundred million names he provided resulted in about a million new ones. I also did some basic brute-forcing, like checking every possible permutation of characters up to 6 characters and checking a-z for 7 characters (every permutation of 7 characters takes 2 weeks to check with my speed so I haven’t finished those yet). In addition, I also did some slightly more complex brute-forcing like getting the top 1,000 most common numbers and adding them to the end of every name in the database, and also making a Minecraft-specific dictionary by splitting words in names and then checking permutations of those.

One fun brute-force I did involved obtaining a list of every .com, .net, .org, and .dev domain. ICANN has a website called CZDS where you can get a list of every domain by just making an account and requesting access. Checking all of these did result in a few hundred thousand more UUIDs, which I found amusing.

At some point in this process I coincidentally met another UUID harvester named Yuno, after he joined a Discord server about Hypixel SkyBlock programming and claimed to have 55 million UUIDs. yuno: I have 55m mojang uuids

I learned Yuno is in the same group as Northernside, and after we talked a bit he told me he’d also obtained his list from scraping, stuffing usernames from data breaches, and generating usernames. He’s also where the 61 million estimate at the beginning of this blog post comes from; he got it by extrapolating with Hypixel’s lifetime player count.

Epilogue

To help me reach 50,000,000 UUIDs, Northernside and Semisol (who had scraped for Hypixel players a few years ago) also donated their lists to me. At the time of writing, I have a total of 51,569,249 UUIDs. I’ve published all the UUIDs (and usernames, and Mojang API responses) I have at archive.org/details/minecraft-uuids-2024-02-22.

If you’d like to check if you’re in the dataset (you probably are), here’s a convenient widget for searching usernames in my database:

FAQ

Why?

The voices

What could the data be useful for?

The data will probably be useless to you, but you could use it to reduce the number of requests you have to do to the Mojang API. It could also be used for making user lookup websites like NameMC, or maybe doing something like training AI on usernames or skins or something.

How could the data be abused?

Making archives of user-generated content will usually be controversial since it makes it harder for users to delete their data. However, I believe the harm here is minimal since my dataset doesn't have very much (UUIDs, usernames, skins) and there's other ways of obtaining people's old names anyways (like NameMC, laby.net, etc).

Stats

Here’s some random miscellaneous stats about usernames that I think are interesting:

Length distribution:

1 4 2 332 3 51k 4 586k 5 2.0m 6 4.8m 7 6.7m 8 7.5m 9 6.9m 10 6.1m 11 5.0m 12 4.0m 13 2.9m 14 2.2m 15 1.8m 16 1.0m 19 1


Most common words (split by underscores and camelCase):
1. the
2. mr
3. xx
4. king
5. mc
6. man
7. gamer
8. yt
9. big
10. its
Most common suffixes (numbers and underscores at the ends of names):
1. _
2. 1
3. 2
4. 123
5. 3
6. 7
7. 0
8. 12
9. 69
10. 11
The most common years out of the suffixes seem to be:
1. 2004
2. 2003
3. 2010
4. 2002
5. 2005
6. 2012
7. 2001
8. 2006
9. 2007
10. 2011
Most desired names (most common names when the suffixes are removed):
1. alex
2. shadow
3. max
4. jack
5. chris
6. daniel
7. david
8. nick
9. ghost
10. leo

Making a metasearch engine

In 2020, tired of every search engine seemingly having suboptimal results and missing the instant answers I wanted, I decided to make a search engine for myself. I knew making a general-purpose web search engine from scratch by myself was infeasible, so instead I opted to make a meta-search engine, which aggregates results from other web search engines. First I tried forking Searx, but it was slow and the old Python codebase was annoying to work with. So instead of forking an existing project, I made my own (but with several ideas borrowed from Searx) in NodeJS which I called simply ”metasearch” (very unique name). I used it as my primary search engine for over a year, but it was slow (mostly due to it being hosted on Replit and being written in JS) and brittle to the point where at the time of writing the only working search engine left is Bing.

A few weeks ago I decided to rewrite metasearch as (brace for it) metasearch2 (my project names only continue to get more original). In this rewrite I implemented several of the things I wish I would’ve done when writing my first metasearch engine, including writing it in a blazingly fast 🚀🚀🚀 language. There’s a hosted demo at s.matdoes.dev, but I’d much rather you host it yourself so I don’t start getting captcha’d and ratelimited. This blog post will explain what you should know if you want to make a metasearch engine for yourself.

The search results for 'metasearch' on my metasearch engine

Other (meta)search engines

First, some prior art. The metasearch engine most people know is probably Searx (now SearxNG), which is open source, written in Python, and supports a very large number of engines. It was the biggest inspiration for my metasearch engine. The main things I took from it were how result engines are shown in the search page and its ranking algorithm. However, as mentioned previously, it’s slow and not as hackable as their readme would like you to think. The (probably) second most well-known metasearch engine is Kagi, which sources its results from its own crawler, Google, Yandex, Mojeek, Marginalia Search, and Brave (I’ll talk about these search engines later). One interesting feature Kagi has that users seem to appreciate is the ability to raise/lower rankings for chosen domains. I haven’t used Kagi much, but the reasons I don’t use it is because it’s paid (I can’t afford to pay $10/month for a search engine) and because I can’t customize it as much as I can customize my own code. There’s also been some other metasearch engines in the past like Dogpile and metacrawler (both still exist, surprisingly) but they’re not worth talking about.

The search results for 'metasearch' on a random SearxNG instance

Also, of course, there’s my metasearch engine. Instead of just listing what engines I use, I’ll tell my opinion of every search engine that I think is interesting. I haven’t used some of these in years, so if you think their quality has changed in that time, let me know.

  • Google: Some people deny it, but from my experience it still tends to have the best results out of any other normal search engine. However, they do make themselves somewhat annoying to scrape without using their (paid) API.
  • Google’s API: It’s paid, and its results appear to be worse sometimes, for some reason. You can see its results by searching on Startpage (which sources exclusively from Google’s API). However, you won’t have to worry about getting captcha’d if you use this.
  • Bing: Bing’s results are worse than Microsoft pretends, but it’s certainly a search engine that exists. It’s decent when combined with other search engines.
  • DuckDuckGo/Yahoo/Ecosia/Swisscows/You.com: They just use Bing. Don’t use these for your metasearch engine.
  • DuckDuckGo noscript: Definitely don’t use this. I don’t know why, but when you disable JavaScript on DuckDuckGo you get shown a different search experience with significantly worse results. If you know why this is, please let me know.
  • Brave: I may not like their browser or CEO, but I do like Brave Search. They used to mix their own crawler results with Google, but not anymore. Its results are on-par with Google.
  • Neeva: It doesn’t exist anymore, but I wanted to acknowledge it since I used it for my old metasearch engine. I liked its results, but I’m guessing they had issues becoming profitable and then they did weird NFT and AI stuff and died.
  • Marginalia: It’s an open source search engine that focuses on discovering small sites. Because of this, it’s mostly only good at discovering new sites and not so much for actually getting good results. I do use it as a source for my metasearch engine because it’s fast enough and I think it’s cute, but I heavily downweigh its results since they’re almost never actually what you’re looking for.
  • Yandex: I haven’t used Yandex much. Its results are probably decent? It captchas you too frequently though and it’s not very fast.
  • Gigablast: Rest in peace. It’s open source, which is cool, but its results sucked. Also the privacy.sh thing they advertised looked sketchy to me.
  • Mojeek: I’m glad that it exists, but its results aren’t very good. Also it appears to be down at the time of writing, hopefully it’s not going the way of Gigablast.
  • Metaphor: I found this one very recently, its results are impressive but it’s slow and the way they advertise it makes me think it’ll stop existing within a couple years.

Scraping

If you didn’t figure it out already, the engines I use for my metasearch engine are Google, Bing, Brave, and Marginalia. Some of these have APIs, but I chose not to use them due to pricing, worse results, and increased complexity due to requiring API keys. Scraping the sites is relatively easy. In my NodeJS implementation I used Cheerio for parsing the HTML, and in my Rust implementation I used Scraper. They’re both very nice. The most annoying part of scraping is just figuring out what selectors to use, but it’s not too bad. To provide an example, here’s the CSS selectors I use for Google:

  • Search result container: div.g > div, div.xpd > div:first-child (Search results are usually in .g, but not always. Other results, including ads, are in .xpd, but the :first-child filters out the advertisements since ads never have a div as their first element)
  • Title: h3
  • Link: a[href] (For some search engines you have to get the text instead of the href, and for Bing you have to get the href and then base64 decode a parameter since it’s a tracking URL. Google used to put a tracking URL on their a tags too, but it seems to have mostly been removed, except on the links for featured snippets).
  • Description: div[data-sncf], div[style='-webkit-line-clamp:2'] (I don’t like this at all, but Google is inconsistent with how descriptions are done in their HTML so both selectors are necessary to reliably get it).

Note how I avoid using class names that look randomly generated, only using g and xpd (whatever they stand for) since I’ve noticed those never change, while the other random class names do change every time Google updates their website. You can check my source code and SearxNG’s if you need help figuring out what selectors to use, but you should always check your target site’s HTML in devtools.

Firefox devtools open, showing the HTML for the Google result for 'metasearch'

Some websites make themselves annoying to scrape in ways that aren’t just making their HTML ugly, though. Google is the major culprit here. Google appears to always captcha requests coming from a Hetzner IPv6, Google will block your requests after a while if you’re using your HTTP client library’s default user agent, and Google captchas you if you make too many queries, especially ones with many operators. A couple other things you should watch out for that you might not notice are that you should make sure your TCP connections are kept alive by your HTTP client library (usually done by reusing the same Client variable), and making sure compression is enabled (which can sometimes save hundreds of milliseconds on certain search engines).

Ranking

The algorithm I use for ranking is nearly identical to the one Searx uses, and it’s surprisingly effective for how simple it is.

def result_score(result):
    weight = 1.0

    for result_engine in result['engines']:
        if hasattr(engines[result_engine], 'weight'):
            weight *= float(engines[result_engine].weight)

    occurences = len(result['positions'])

    return sum((occurences * weight) / position for position in result['positions'])

Note that the position is 1-indexed, otherwise you get results with infinite score, lol. The only change I made was to not multiply by occurences at the end (and instead just summing weight/position for each engine with the result). I never actually noticed I had this difference until writing this, but I don’t believe it made the rankings worse. Also keep in mind that you may want to slightly normalize the URLs, for example converting them to always be HTTPS and removing trailing slashes, so the results from different engines can merge more nicely.

Instant answers

Instant answers are the widgets that show up when you search things like math expressions or “what’s my ip”. I think Google calls them just “answers”, but that can get confusing. In my old metasearch, I had a lot of unique ones for things like pinging Minecraft servers and generating lorem ipsum text. In my rewrite I didn’t implement as many since I don’t have as much of a need for them anymore, but I still did implement a few (and I haven’t finished adding all the ones I want yet). My favorite engine I implemented in my rewrite is the calculator, which is powered by a very neat Rust crate I found called fend. This makes it able to calculate big math expressions and do things like unit conversions. I did have to add a check to make it avoid triggering on queries that probably weren’t meant to be calculated, though. I also added some checks to support queries like ord('a') and chr(97) (I wasn’t able to add a custom function to fend to support chr, so it has a very big regex instead :3). I imagine you already have ideas for instant answers you could add. If you want more inspiration, a good source with over a thousand instant answers is DuckDuckHack (which DDG unfortunately killed).

Searching for 'ord('a') + 3' on metasearch2, the result is '0x64 = 100'

Another thing I chose to do is to use Google’s featured snippets and display them similarly to instant answers. Also, I made it so if a StackExchange, GitHub, or docs.rs link is near the top of the results, then they get scraped and shown on the sidebar (only on desktop though, it takes up too much space on mobile).

Rendering results

This part is mostly easy, since search engines usually don’t have any complex styling. You can use whatever web/templating framework you like, but ideally it should be one that can return the response in chunks (so the header HTML is sent immediately, and later the HTML for the search results is sent) since it makes it feel faster for the user. I chose not to use any templating framework for metasearch2 (since simplicity was one of my goals), and instead just made it build up the HTML by hand. This also made it very easy to add chunking, and I took advantage of this by having it show the user real-time progress updates of the search engines that are being requested.

The live progress display I have on metasearch2

Your turn

There’s several things I intentionally omitted for the sake of simplicity in my metasearch engine. This includes pagination, the ability to search for images, reverse-image search, some type of authentication or ratelimiting to prevent abuse (I might have to do this eventually, hopefully not), adding more search/answer engines, and adding a way to configure it easily. I gave the CC0 license (public domain) to metasearch2 so you can do absolutely whatever you want with it (attribution isn’t required, but it is appreciated). Have fun.

Why did "matscan" join my Minecraft server? (FAQ)

matscan is a Minecraft bot that joins potentially vulnerable Minecraft servers and sends a message in chat to inform the admins.

How should I secure my server?

It should’ve told you in its long chat message but some servers might cut it off:

If you’ve done all of the above, then you’re probably fine.

How did you find my server?

I scan the internet for Minecraft servers, basically sending a packet to every IP address and seeing which ones respond (it’s a little more complex than this).

Is your data public?

No. You should still secure your server though since there are several griefing/harassment groups that use their own server scanners.

Why did Herobrine try to join right before matscan?

matscan will try to join with the username Herobrine first, so if the server is offline-mode then it can demonstrate that people can join with any username. It may also use the username of a historical player if the server is offline-mode but has a whitelist.

How can I contact you?

My Matrix is @mat:matdoes.dev (preferred), but you might be able to find me on other social medias.

How can I help?

If you appreciate the security work I do, please consider funding my projects at ko-fi.com/matdoesdev.

This website now supports Gemini

Gemini is a protocol similar to HTTP, in that it’s used for transmitting (mostly) text in (usually) a markup language. However, one of the primary goals of Gemini is simplicity. Requests are always a single TLS/TCP connection with the route, and a correct response looks like 20 text/gemini\n\rhello world\n. Additionally, Gemini uses a language called “Gemtext” as its markup language. It’s kind of like Markdown, but even simpler. Every line can only contain a single type of data, so for example you can’t have links in the middle of text. Read the Gemini spec if you’re interested.

Translating HTML to Gemtext

Anyways, so I decided to make my website support the Gemini protocol for fun. The plan is to make it translate the HTML on my blog into Gemtext, which shouldn’t be too hard considering that HTML is generated from mostly markdown.

Here’s an example of a typical blog post I write, mostly markdown and some HTML.

At first, I tried using the html_parser Rust crate to read the HTML and flatten it out. However, I soon ran into issue #22: Incorrectly trimming whitespaces for text nodes. This made text be squished with links, and while technically I could’ve added workarounds by having it add spaces there I figured it’d be better to avoid issues with that in the future by just using a different crate. I looked at other HTML parsing crates and decided on tl, which does not suffer from the same issue as html_parser.

If you remember from earlier, though, Gemini does not support inline links! I considered other options like putting every link at the end of the post, but I decided to make it dump the links at the end of every paragraph so they’re easy to find while you’re reading. To make images work, I had to make my crawler download them into a directory so the Gemini server could serve them easily. The actual Gemtext for them is straightforward though.

TLS

To actually serve the Gemini site (capsule, technically), I initially thought I was going to use Agate, but I decided it would be more fun to make my own server (and it’d make it easier to integrate with the crawler). The only thing I was kind of worried about implementing was TLS. I started by copy-pasting from the Rustls examples on their docs, but I wasn’t sure how to make the self-signing work. I took a look at how Agate was doing it, and they’re also using Rustls but through tokio_rustls, and using a crate called rcgen for generating the certificates.

My code for that ended up looking kinda like this:

use rcgen::{Certificate, CertificateParams, DnType};
use tokio_rustls::rustls;

let mut cert_params = CertificateParams::new(vec![HOSTNAME.to_string()]);
cert_params
    .distinguished_name
    .push(DnType::CommonName, HOSTNAME);

let cert = Certificate::from_params(cert_params).unwrap();

let public_key = new_cert.serialize_der().unwrap();
let private_key = new_cert.serialize_private_key_der();

let cert = rustls::Certificate(public_key);
let private_key = rustls::PrivateKey(private_key);

After I set it up to wrap the TCP connection with TLS, it worked! At least, it worked on Lagrange, my client of choice. I thought this would be the end of getting my server implementation to work, so I deployed it to a VPS, opened the port on IPv4 and IPv6, and added the A and AAAA records to Cloudflare.

(spoiler: it was not the end of getting my server implementation to work)

Making it work everywhere

I realized it may be a good idea to test on more clients, just to make sure it all works properly. The second client I tried was Castor. When I tried loading my capsule on Castor, it didn’t load. I went looking for solutions, and stumbled upon a ”Gemini server torture test”, which basically does a bunch of crazy requests to servers and makes sure it responds to all of them correctly. When I first ran it, my server was failing most tests. I looked at the failing tests that looked most suspicious, and decided to implement TLS close_notify first, since not implementing it was a violation of the spec I’d initially overlooked. Fortunately implementing it was very easy, just a single line change. This fixed the capsule on Castor.

I then tried another client, for mobile this time, called Buran. It did not load my capsule :sob:. I tried more clients, and the majority seemed to be failing as well. I implemented more fixes, some of which were in the torture test, and some which weren’t. This made the websites accessible when I was hosting locally, but not when it was deployed to my server.

I wasn’t sure how this was possible, and I considered the possibility of perhaps my server not supporting TLS 1.2 properly (I knew it supported 1.3 since the torture test tests for that). I found a random Gemini client Python library that failed to send requests to my server and modified it to always use TLS 1.3, but this did not resolve it either.

I added more logging to my server, and noticed that the clients weren’t even opening a TCP connection. Maybe it’s a DNS issue? DNS seemed to be working fine, but I noticed running print(socket.getaddrinfo('matdoes.dev', 1965)) from Python always puts the IPv6 first. Maybe it’s an issue with IPv6 then? The torture test has a check for IPv6 though… I removed the AAAA DNS record and waited a few minutes, and this actually worked!? I didn’t want to keep my site IPv4-only though, so I kept trying to track down the source of the issue. Maybe I had to put the IPv6 in expanded form when I pasted it into the DNS records?? (this did not work, of course).

After a bit of searching, I found a discussion on Tokio’s Axum web framework that seemed relevant.

The following results in an Axum which is available on port 3000 via IPv4 only. How can I make it available on IPv6, also? let addr = SocketAddr::from(([0, 0, 0, 0], 3000));

Try with: let addr = ":::3000".parse().unwrap();

Was this actually the solution? I was under the impression 0.0.0.0 would work for both IPv4 and IPv6. I replaced 0.0.0.0 with :: in my code, and this actually made it work everywhere! :tada: (I later replaced it with [::], just in case, though I don’t think it was actually necessary).

Caddy issues

This is completely unrelated to Gemini, but I wanted to mention it anyways. Originally, my website was hosted on Cloudflare Pages, since it’s just a static site. However if I wanted to make other ports accessible, I’d have to make it not be proxied by Cloudflare. I decided to just move it to the server I was already hosting my Matrix and Mastodon (technically Pleroma) instances on so I wouldn’t have to buy a new server.

I copied a script I wrote a while ago that automatically watches for changes on GitHub and runs a shell command when there’s a commit. I know it’s kind of cursed and I should be using a webhook or whatever but this works good enough. So anyways I made it put the build output in /home/ubuntu/matdoesdev/build and told Caddy to have a file-server route on matdoes.dev with that directory as the root.

I tried to reload Caddy, but it was taking an unusually long amount of time and eventually timed out. I enabled debug logs but didn’t see anything too suspicious. I then tried to completely restart Caddy, but this made the Matrix and Pleroma instance on the server inaccessible … After waiting about ten minutes, the issue resolved itself and the other routes were accessible again.

The other routes. i.e., not the route I was trying to add. This time, though, I was actually getting an error. When I tried to access the domain, I saw an error in the log that said something about not having enough permissions to read the directory. I modified the permissions on the directory and all the files in it to be readable, writable, and executable by every user, but this somehow did not resolve the issue.

I found a post on the Caddy forums that appeared to be about someone having the same issue as me.

The first answer:

the caddy user still has to have execution access for every parent folder in the path to traverse/reach the file.

Why??? I don’t want to give the Caddy user permission to access every parent folder. I ended up just making a /www directory and having it copy the build output to there, and I did not come across any more significant issues.

Other stuff

Maybe I’ll support for more protocols to my website in the future? I saw lots of talk about Gopher while I was looking around the Geminispace, and maybe it’d be cool to also make the website be accessible from Telnet or SSH or something.

Here’s the code for my crawler/translator/Gemini server, it’s not particularly great but it works.

Minecraft Server Scanning Inc

For several years I’ve occasionally logged onto Shodan and searched for Minecraft servers. I just join, look around, and maybe leave a sign for the server owner. I’d also occasionally heard stories about people making their own Minecraft server scanners.


A while ago, on April 1st 2022, cybersecurity YouTuber LiveOverflow uploaded a video titled “I Spent 100 Days Hacking Minecraft”. Despite being uploaded on April Fools’, the video and series that followed was actually really interesting. Anyways, after a bit I got the idea of searching for “liveoverflow” on Shodan. To my surprise, the server actually showed up and even more surprisingly it wasn’t whitelisted. There were signs at spawn that congratulated you but said “I hope you built a tool yourself”. I had not built a tool myself.

A few weeks later, Minecraft documentary YouTuber TheMisterEpic uploaded a video about “Minecraft’s most dangerous glitch”. Spoiler: The bug in the video is not dangerous. I wanted to let TheMisterEpic know, so I joined his Discord and pinged him in general chat. Some people were disagreeing with me, but a member in the server named Ada came to my defense. We talked a bit and another server member named Gildfesh told me how him and Ada developed Minecraft hacks and had recently released a mod for faking Minecraft chat reports. Later, a member named Shrecknt brought up LiveOverflow’s Minecraft YouTube series. This made me start thinking about it again, so I decided to rejoin the server once again. This time, there was a player online. We talked a bit, and they invited me to a Discord server named “Server Scanning Inc”. Everyone in the server was super cool, and coincidentally Ada and Gildfesh also happened to be members here.

Original screenshot of my base, unfortunately it's too griefed now to take a better screenshot

I grinded on LiveOverflow’s Minecraft server for the next few days, building a house, acquiring good tools, building some infrastructure in the Nether, and making an underground potato farm. For fun, I set up a website that tracked Hermitcraft player activity and made a honeypot Minecraft server using node-minecraft-protocol that logged all the bots who pinged or joined. I got pings from Souper (who was in the Discord), Natekomodo, Shodan, and other IPs whom I couldn’t find the owners of. Later I also got joins from some bots and other random people with scanners.

Making a scanner

I got jealous of how everyone in the server had their own scanner and I cheated by using Shodan, so I decided to scan the internet myself. I already knew of masscan, which lets you send TCP SYN packets to every IPv4. Masscan also has support for getting basic banner data of various protocols such as HTTP and SSH, but it didn’t have support for pinging Minecraft servers, so I decided to attempt to add it myself. Unfortunately, masscan is written in C and I have very little experience in C. While trying to write the code, I remembered Bithole’s blog post about scanning the internet for Minecraft servers so I went to re-read it to see how they implemented scanning. As it turns out, they had done exactly what I was attempting to do, so I decided not to waste more time writing C and just “borrowed” their masscan fork. I had to make slight modifications, such as making it possible to scan for Minecraft servers on every port and not just 25565.

I already had a VPS on Oracle Cloud, which gives you a free “forever” powerful server, so I ran the masscan fork on it. There were several issues. First, it took several hours to scan as opposed to the 5 minutes I was led to believe it would take. Second, it was missing most servers which I expected to be there. Third, when I searched for my honeypot in the results, there were several servers that looked like they had the same MOTD as my honeypot but had different IP addresses.

For the first issue, I learned that the “millions of packets per second” advertised by masscan only applies to dedicated machines that weren’t virtualized. I also tried installing PF_RING, but that doesn’t work on virtualized machines either. Oh well, several hours isn’t that bad but it wasn’t what I had hoped for.

The second issue was a much bigger deal: Oracle Cloud was dropping packets. When I scanned at lower rates, I got less dropped servers, but also I didn’t want to scan at lower rates. I found I could get Oracle to drop less packets by making the firewall rules stateless, but the amount of dropped packets continued to be unacceptable. I’d have to switch to a different host.

The third issue was a bit annoying, and I found out it was happening because crazy people decided to make “mirror” servers that open TCP connections with you and send you everything you send them, so they ended up pinging my honeypot (which was on the same server as my scanner) when I scanned them. This would stop being an issue once I switched hosts.

My current design for my scanner runs on two servers. The first server has a Python script that hosts a Discord bot and automatically executes masscan commands that pipe into another Python script that uploads to my database. The second server has a Cron Job that executes every hour that runs a custom scanner I made in Rust that checks every alive server that’s already in my database, and pipes that output to the same script used for uploading my masscan results. I also store players that were previously online into my database so later I can search for servers that players frequent.

Trying Hosts & Complaints

Since Oracle Cloud was dropping packets, I had to find a different host to do my server scanning on. First, I tried Google Cloud, but they shut down my instance after a couple of days for “abnormal activity”. Then I tried Azure, but their website was broken and didn’t let me use my free credits. After those, I decided to try Linode. I’d heard people recommend Linode for scanning, so I thought it’d be fine, And it was, for about two weeks. I occasionally received abuse complaints on Linode, but I just replied to them telling them that I was simply scanning the internet for Minecraft servers and everything I was doing is legal. I had to do this several times, and they were fine with me, until they decided they weren’t.

Final Linode abuse complaint

The final abuse complaint was actually sent by Hetzner for some reason, even though I wasn’t using Hetzner for scanning. After being banned from Linode, I deleted my account and went back to searching for a new host for my server scanning. I’d heard from Souper that Scaleway was very good for scanning, so I bought their $2/mo Stardust plan and went right back to scanning. Their 100mbps bandwidth is a bit limiting, but the reliability of their service and the fact that I haven’t received a single abuse complaint makes up for it.

One funny thing I encountered early on was a silly little website called AbuseIPDB. Their about page says the website is for “identifying IP addresses associated with malicious activity online”, but from my experience that’s not what it’s used for. Fortunately, reports on AbuseIPDB are purely cosmetic and don’t actually do anything other than serve as a fun leaderboard. At the time of writing, my primary scanning IP has 769 reports and I have friends with thousands of reports, despite doing nothing other than sending SYN packets to a few IP addresses.

Bots

So remember that underground potato farm I made on LiveOverflow’s server? I could obtain about 1 inventory’s worth of potatoes per 30 minutes-ish, but me and a player named 3j4 decided that just wasn’t going to cut it. Together we made a massive potato farm, which we dubbed Ireland, so by the time you finished harvesting all of the potatoes it would be ready to be reharvested. We tried Baritone’s farm feature, but it left many potatoes to despawn and I didn’t want to leave my Minecraft client open. I decided I was going to make my own potato farming bot that stayed online 24/7. I used Mineflayer to make it, and it worked by walking back and forth in the farm and instantly breaking and placing 5 potatoes at a time. With this, we were easily able to obtain millions of potatoes.

Potato farm

I also decided to make the bot bridge in-game chat to a channel in the Discord, so we could easily talk to players on the server. For a while, my bot was the only bot on the server. A bit later, my friend SushiPython set up his own bot that stayed near the servers spawn-point to monitor events near the server’s spawn, so if someone attempted to grief we would know who it was. For a while these were the only bots, but after LiveOverflow leaked the IP several times many more players started joining and making their own bots. First, it was uptime_check; every 5 minutes they joined and sent “chat test”. Then, it was EmmaIsSad, who made a bot that joined after uptime_check joined and sent “cat test <3”. This became a meme. There were also a few other people who made bots that linked chat to their own servers, and bots that added custom commands to chat. Funnily enough one of these people was Shrecknt, the person who mentioned LiveOverflow in TheMisterEpic’s Discord and made me want to check on it and basically the reason I ended up doing all this.

Honeypots

Screenshot of a Discord channel showing several pingers

I briefly mentioned my honeypot server earlier, how it uses node-minecraft-protocol to pretend to be a real server. However, the server is just a void world and kicks you after 15 seconds with an invite to the Discord server with the honeypot logs. I also set the server MOTD and player list to something enticing, with the hope that a random person with a scanner finds the server and tries to join. And it worked! At first, I was getting joins from bots with randomized names such as “WwMygQ” and “5PhFta3”, but after a bit I got my first griefer who fell for the honeypot. They also joined the Discord server, so I DMed them. Their username was “Gangmaster”, apparently they were an independent griefer who thought my server was legit, but unfortunately for them it was not. I also got joins from at least two Copenheimer users, including the lead developer of it, Orsond.

A lot of signs at spawn

I also gave the source code for my honeypot to some friends, and they modified it and set it up on their own servers. A member named Cleo made a really cool honeypot that proxies traffic from the honeypot to the real server, but logging the players who join and adding some fake disconnections and lag to troll the people using them. At the time of writing, there’s 6 alive servers named “LiveOverflow Let’s Play”. There was also some drama that happened when a member decided to fill up the server with bots to prevent people from joining, but I’m not going to get into that here.

Conclusion

There were some other fun things that happened, such as:

  • Me realizing that Docker Minecraft Server has “minecraft” as the default RCON password (though I only found a few hundred servers with the port open with that password)
  • Me making a Minecraft server for exploring servers in my database by entering doors in an infinite hallway.
  • SushiPython and I confusing a lot of random Minecrafters by joining their servers.
  • Me making a bot that joined every 1.19.2 offline mode server and tried joining as past usernames until it found an admin.

If you read this entire blog post, I encourage you to make your own server scanner and have your own adventures. As a little sendoff, here’s a few completely random servers from my database for you to explore:

If you want more, make your own scanner. See you next time, when I decide to publish another blog post in two years ;)

matdoes.dev markdown

A couple years ago when I was creating the matdoes.dev blog I wrote a somewhat powerful markdown system with Regex to allow me to more easily write blog posts. Although I’ve barely written any posts, I’m still proud of it. Also this post is mostly just reference for myself, lol. I present: matdown™


Relative anchor: [matdoesdev](/blog) matdoesdev

External anchor: [matdoesdev](https://matdoes.dev) matdoesdev (External anchors have target=_blank so they open in new pages)

Normal links: https://matdoes.dev https://matdoes.dev

Code block: ```py print(‘code’) ```

print('code')

Inline code: `code` code

Block quote: > text

text

Italic: *text* text Bold: **text** text Italic & bold: ***text*** text

Horizontal center: ||text|| ||text||

Titles: # h2 ## h3 ### h4 #### h5 ##### h6 ###### h6

h2

h3

h4

h5

h6
h6

Horizontal rule: ---

Image: ![description](https://image)

The Story of ReportScammers

I wrote this story on the Hypixel Forums a while ago, but I realized it would be a good idea if I posted it on my blog too.

Intro

ReportScammers was a robot on the Hypixel SkyBlock Forums that automatically replied to posts where people were complaining that they got scammed. It all started on April 27th, 2020. I was bored and wanted to make a Hypixel Forums bot. At first, I wasn’t sure what I wanted it to do. Then I thought, “what’s a task that humans do often that could be easily automated?”: complaining about people getting scammed, of course.

madcausebad11

I didn’t do anything with this idea until a couple weeks later on May 14th, when I remembered it, and was actually motivated to create it. I asked around on the SkyBlock Community Discord for what it should be called and what it should do, and I decided on calling it madcausebad11 (name chosen by @TatorCheese), and making it say “thats crazy but I dont remember asking” (@Bliziq chose that one) to all posts that mentioned being scammed. Screenshot of a post on the Hypixel Forums where a user named madcausebad11 says 'thats crazy but i dont remember asking' When that had been decided, I started working on the code. It was written in Python, using BeautifulSoup to scrape the web pages and aiohttp to make the requests. After an hour of writing code, madcausebad11 was working.

Less than an hour after the bot started working, it got banned for the reason “spam”.

reportScammers

A day after madcausebad11 got banned, I decided to make it again, but better. This time, I was going to make it look like a human. I added more delays, random messages, a profile picture made in MSPaint, and fixed more false positives. This became what you all (probably) know, and (maybe) love, reportScammers. This version of the bot also wasn’t toxic, as it just said “Please report scammers at hypixel.net/report” (or some variation of that) to all messages complaining about being scammed, and people didn’t hate it that much this time. I checked the forums often on this account, so if anyone talked about the bot I would be able to respond, and there were some people that called reportScammers a minimod and a bot, but it was fairly unknown so most people didn’t care. There were, of course, people that suspected reportScammers was a bot. Every time I saw one of these messages, I responded manually, sometimes pretending to get angry about people thinking it was a bot, even though it was. There were also many posts baiting the bot to reply, by making the title of their post a variation of “I got scammed”, even though they didn’t. To combat this, I made the bot only reply to messages from new members, as well-known members were likely only trying to post farm. I had a few problems making the bot work well though, such as the Cloudflare captcha screen, meant to prevent bots from scraping the forums. However, reportScammers wasn’t a bot, so I found a Python library meant to bypass it and tweaked the source code to make the library asynchronous. Near this time I also updated the logo for reportScammers in Photoshop, but still with the MSPaint vibe.

Dafty = reportScammers???

On June 9th, a member of the SkyBlock Community Discord followed reportScammers, and they told me “follow me back”, “thats rude”, so I did. This user was @Dafty. I pointed out how the only person reportScammers was following was Dafty, so people will think they run the account. We got @pigeo to write a forum thread “exposing” reportScammers, and then some people starting making their own forum threads, and then I wrote my own forum thread titled “Addressing the reportScammers situation” on my main account. However, we had to go further. Dafty asked @SecureConnection to change the name of their alt to reportScammers, so we could link the forum account and look even more human. At this time I also gave the login details of the account to Dafty, so he could help reply to messages faster, and farm more messages. Dafty also created a Minecraft skin for the account, which was simply a Steve holding a Hypixel logo.

The death of reportScammers

At this time, many parody accounts started popping up, such as reportScammersbrother, scam-bot, NoPublicShaming, and NoTrollingBotXD. One day, I noticed reportScammers had suddenly stopped replying to posts. I first thought this must just be a glitch with the code, but when I looked further, I could not find any recent posts complaining about being scammed. Maybe people just stopped getting scammed? I thought this was the reason, but no staff members wanted to confirm. I made another alt account to test this, and I found out that the admins have disabled new members from being able to create posts with the word “scammed” in them. I sadly went to disable the code running the bot, but wanted to make one last message as reportScammers. This thread is the first and last thread by reportScammers, created by @matdoesdev.

Uncovering the Discord Twitch Bots

So a few days ago my friend Slip got a DM on Discord from this “Twitch” bot asking him to invite it to my servers as well as to join theirs. The message the bot said claimed that Discord and Twitch had partnered up to give its users free Nitro Games and free Twitch Prime.

Twitch partnership scam dm It obviously looked fake, so Slip created a testing server and added me and some friends to help. Upon joining, the fake Twitch bot DMed everyone in the server with the same message as it sent to Slip. It looked like some sort of social engineering worm, but it hadn’t done anything bad yet, so we revoked the bot’s perms and left it in the server.

Free nitro scam dm When I joined the server it linked, it looked like some sort of bad giveaway server with giveaway channels, and even a rules and TOS channel. Unfortunately for us, there were no channels that we could talk in to inform other people. Soon after joining, we got another DM from a different bot but with the same name. Again, it contained a link to join a Discord server. However, this time, instead of saying it was from Twitch, it took a more straightforward attempt, saying to join for “Nitro / Nudes”.

It was getting late, so we went to sleep. When we woke up, we were greeted by at least 4 other bots with the same message and name, so we just invited all of them to our server! The old bots were now offline and for some people, the bots’ names were displayed as things such as “thisisaspambot”, “Fake Twitch Bot”, and “Fake bot”. We later found out that this was in fact a doing of the Discord Trust and Safety team, but they didn’t do it very well because some of the bots could still DM people, and it didn’t always show up with their new names.

Scam bot list, new names

Another interesting thing was in a MediaFire link one of the bots DMed to Slip. He shared it with us, and the file claimed to be an executable containing a Nitro generator, but it looked obviously fake, evident by the instructions text file provided.

Free nitro download DM

How to use the Discord Generator :

1. Disable anti-virus, and open it.
2. When you opened it, press on ''Generate'' and good luck!
3. It says its a virus because this generator generates accounts, so obviously it will say its a virus, but its not.

If its not working, it means u dont have the good version.
Good Luck!

What even is that grammar…

Another notable thing was that when we searched up the owner of the “Free Nitro” Discord server on YouTube, it returned their channel. One of the videos was a free Nitro generator, leading to the same exact MediaFire link, so we knew there was a definite link with that user.

Anyway, we booted up Windows Sandbox and ran the virus with a process monitor in the background. There were a bunch of references to Python, so it was likely a Python script compiled into an exe.

Process monitor

I wasn’t sure what it was compiled with, so I tried running unpy2exe on it, but it returned an error telling me to use pyinstxtractor instead, as it was compiled with pyinstaller. After we ran pyinstxtractor on the exe, it returned a folder with a bunch of pyc and pyd (Python bytecode) files.

PyInstxtractor

Created on March 2nd

It looks like it was created on March 2nd.

No matter what we tried, we couldn’t decompile it into normal readable Python, so we just analyzed the bytecode using the dis Python module. There was a bunch of references to tokens and browser LocalStorage, where the token is stored. The malware also sent an http request to api.ipify.org (to grab the victim’s IP address), the user’s email and phone number, as well as the user’s nitro status.

There was also a funky looking base64 string, which revealed to be a Discord webhook that the script sent the user’s details to.

Once we got hold of the webhook, things got spicy.

We tidied up the testing server a bit and hid our discussion channels, then made the invite look as appealing as possible.

Who wouldn't click that?

Using a little webhook spamming script I wrote, we spammed @everyone, as well as an invite to our server, and left it running overnight.

In the morning, We woke up to this: They joined!

They were the admins to that free Nitro server.

We also found out that they had deleted their webhook, which meant we couldn’t spam them anymore, but they wouldn’t get the tokens of any new users.

The first two quickly left, but one sent us a message before leaving.

'wtf are you'

We asked kzh to join back again.

Asking them to rejoin

This led to this hilarious conversation.

1 2 3 4

In summary, these guys are just terrible clowns trying to get tokens from unsuspecting Discord members.

And that ends the tale. We still have the server ID and the channel ID that the webhook was created in as well as the discord tags of all the members and we’ll continue to spam any future webhooks that the Twitch bots send us.

:)

What Are Domain Hacks?

Domain Hack Example A domain hack is a domain in which both the top level domain (TLD) and the second level domain (SLD) are combined to make up a word or phrase. For example, matdoes.dev is a domain hack for mat does dev. Domain hacks are not security-related and they are completely legal.

Most domain hacks use country code top level domains (ccTLDs), for example, .it is for italy, .am is for Armenia, etc. Some companies even purchase their own custom TLDs from the Internet Assigned Numbers Authority in order to create a hack for their domains. Most notably is goo.gle, which was created by Google as a domain hack for their website.


Why Use a Domain Hack?

An advantage to using a domain hack is that your domain is much shorter and therefore easier to remember. Many URL shortening sites such as bit.ly, goo.gl (Google), youtu.be, etc, use domain hacks to make their URLs shorter.

Domain hacks are more fun than normal domains, too, which increases the chance of people clicking on them in search results.

How to Choose a Domain Hack?

Finding a good domain isn’t always easy, so I’ve created a tool hosted on Repl.it that helps you find domain hacks Domain Hack Generator

At the moment, it uses every TLD currently in existence, which may not be what you want since some top level domains cannot be used by most people as they require you to live in a certain area or work for a certain organization. You can customize it by adding or removing from the tlds.txt file.

My tool also checks whether a domain is already taken by someone by seeing if the website has any DNS records. Also, be aware that some TLDs are stupidly expensive. For example, .ng domains can go for up to $50,000

Who is mat?

Welcome to mat does dev. You might have some questions, so I’m here to answer them.

mat does dev


Who is mat? I am mat. I am a human that lives somewhere on a planet called Earth, you might’ve heard of it.

What do you do? I do dev. To clarify, I mean dev as in software development.

Why do you write your name in lowercase? Because I can and no one can stop me.

What type of stuff do you make? I make a variety of different tools, and you can see some of those things in my project list on this website. The list isn’t complete though, as a lot of things I make aren’t particularly presentable.

What programming languages do you use? I mainly use Python, as it’s the language I’m most comfortable writing with. I’m also proficient with JavaScript, HTML, and CSS. I also know limited amounts of C++, C, Go, and Java.

How did you make this website? 2022 update: This website was rewritten in Svelte. The backend for this website was written by me in pure Python with beautiful asynchronous aiohttp.web and Jinja2. The frontend was made with VanillaJS

How can I contact you? You can contact me through Matrix (@mat:matdoes.dev).