Chalk Jotto is my first Android game. I got the idea about a year ago, threw together a playable game over the weekend, and polished it up over the course of the next week after work. But then I started losing interest. I had so many ideas for features, and I tried adding them all at once. I got distracted and kind of forgot about the project, what with moving around the world to Copenhagen to go to University, and then dropping out of my Master’s program to work at an awesome startup instead.

But finally, I decided that I just needed to get it done. I stripped out every half baked feature I had been working on adding. Every time I had worked on the project, I only added more TODO items than I resolved. I was striving for perfection, and I was overdoing it.

Achievements and Leaderboards were done, but I didn’t want to have to deal with GDPR stuff, so I removed it. For the same reason, I also didn’t include ads or crashlytics. The app gathers absolutely no data.

I just wanted to ship a fun word game, not start a business, so I removed everything that wasn’t fun. In only a few hours I had something ready to publish.

I can’t believe I let a project sit 99% finished for so long. Next time I start a project I’m going to focus on finishing a minimum viable product, putting just a bit of polish on top, and then releasing it. I can then add on features if I feel like it later. And indeed, releasing this app today has given me the motivation to add in some of those other feature ideas I had been working.

I may re-add Achievements at some point since I had gone to the trouble of drawing icons for every achievement.

You can download the app here.

Or view the source code here

I recently revisited one of my oldest real projects, Captain Markov. It was a Twitter bot that used a Markov Chain to generate fake captains logs from Star Trek. I learned a lot doing, but I’ve learned much more since then. I decided to rewrite it into Kotlin and do some general clean up along the way.

Kotlin and the considerable experience I’ve gained these last few years allowed me to shrink the size and complexity of the project greatly. A whole system of classes I had been using to scrape scripts was compressed down to a single short function. And had much cleaner and more reliable output. I also simplified the core MarkovChain class and fixed some long-standing issues with it as well.

But I wasn’t satisfied to just rewrite an old project and not doing anything new with it, so decided I wanted to make it generate something else as well.

I had been reading a lot of SCP Foundation articles at the time, and noticed that each article is very structurally similar to every other article. Additionally, the vocabulary and writing style is remarkably consistent. I believe it lends it’s self better to building a Markov model than Star Trek did.

So I found the SCP Foundation wiki’s robots.txt, downloaded their sitemap, and scraped all the articles. Made some changes to the bot to have it generate two types of paragraphs: “Containment Procedures” and a “Description”. It had some amusing outputs, so I generated 15,000 fake SCP articles, and uploaded them to my website.

Next, I downloaded the page of a random article, stripped it down, and added some javascript to fetch a randomly generated article from my server and display it to the user.

Articles number 1-3000 were generated with a chain length of 4, and articles above that were generated with a chain length of 3. A longer chain length generally means that the outputs will be more coherent because longer stretches of text are from the original source material.

I also wrote one of the articles myself, but you will only see it if you cause my PHP code to throw an exception.

Check it out here

I wanted to dip my toes to machine learning, and at the time, I was having fun playing a simple web game called Universal Paperclips based on the famous Paperclip Maximizer thought experiment in which an AI designed to make paperclips eventually destroys humanity in its pursuit of more paperclip making resources. So I decided to make a literal paperclip maximizing AI that would attempt to play the game. I admit the irony of the idea was a major factor in my decision to pursue it. Besides, what could go wrong?

I searched around for some Javascript machine learning libraries, and ultimately settled on Brainwave.js, a JS utility that allows for easy creation, management, and training of configurable neural nets which improve themselves through the use of a genetic algorithm rather than the typical choice of gradient descent.

At first I set it up so it was only playing one game at a time, but quickly realized that training would take much too long. So I refactored my project to run 20 instances of the game at a time in the same browser window. I spent the next week leaving it running for a day, and then tweaking inputs parameters to try to get the AI’s to learn. I eventually succeeded in getting one of the AI’s to learn how to manage the price it was selling its paperclips for so that it could afford to buy more wire instead of running out of resources. A bit short of my lofty goals of converting the entire digital universe into paperclips, but I was satisfied with my results anyways.

One of the biggest mistakes I made early on was having far too many inputs. I instead did some basic manipulations in the inputs to decrease how many things the AI needed to pay attention to. For example, instead of including the current paperclip price, number produced per second, number sold per second, and whether or not the increase / decrease buttons were available, I performed a pre-calculation to determine the net amount of paperclips. Now the AI only had to figure out that if the value was negative it should increase prices, and if it was positive, it should decrease prices.

I’m excited to try more machine learning projects in the future. Next time I’d like to work on something more precise, rather than randomly stirring a pile of linear algebra until it can play a game.

Source code

This was originally a final project for my web development class, but I took it much farther than was required.
Let’s go over the functionality, and then I’ll dive into the implementation.
Here is a link to the page, so you can play around with it. Unless it’s daytime PST, you probably won’t see any buses.

The map shows the live location of all buses currently on the road. You can use the sidebar to filter out bus routes you don’t care about. Clicking on a bus toggles the route it is following being drawn on the map. Clicking on a bus stop gives you directions to it from your current location, as well as showing a list of buses that are scheduled to stop there soon, and how delayed they are.

Originally we were supposed to build some rudimentary functionality by divining how Spokane Transit Authority’s (STA’s) API worked for their unreleased route planning site by monitoring network requests. But their API sucks. Or at least it did at the time. So I did a little digging and discovered that public transit providers around the world use something called The General Transit Feed Specification (GTFS) which is made by Google. There is also GTFS-R which is the real-time variant. STA had set up their own transit feed loosely following this specification and exposed the data in it through their REST API. I could also access the feed directly via some *.pb files on STA’s site. However, when you opened it, it looked basically like random binary data.

Thankfully, Google has GTFS Realtime Bindings on GitHub. Now instead of figuring out how to work STA’s clunky API, I could pull some of Google’s code, and extend it into a REST API of my own to use on the project. Google provided some PHP code that downloaded the latest pb file from the GTFS-R feed and parsed it out in a slightly usable, poorly documented PHP object. I spent a few hours poring through the several thousand line PHP file boiling it down into something I could actually comprehend (doc1, doc2). I then turned the entire PHP object into an associative array that could be easily encoded as JSON. I also implemented some caching so I wouldn’t be putting to much strain on STA’s servers.

As the front end progressed, I would break off the pieces I needed into an endpoint.

Some run of the mill Google Maps API documentation/tutorials later and I had a functional (and useful!) client and server.

Source Code

JSON Resume is an open source initiative to create a JSON-based standard for resumes.

I find the idea very interesting, and I used it to make my resume for this site. I think creating a standardized format for resumes is a worthy goal. I can envision job search sites storing and serving your resume data up to potential employers, which they would be able to view in a resume viewer program, or by integrating the data directly into their own system. No iffy PDF scraping required. No uploading your resume AND filling out a million form fields. The UX would be superior for both job seeks and job providers.

The raw JSON for my resume can be seen at this link or by sending an HTTP GET request to

After organizing all of my data into JSON format, I found a program called HackMyResume to convert the JSON file into various usable formats, and styling them with JSONResume themes. I chose HTML and PDF formats for use on this site.

One thing that annoyed me to no end at my job at VIP Traders was the large stack of paperwork with apparently no corresponding folder to put it into. Then I discovered that hidden away in everyone’s desk is a pile of folders that have slipped through the cracks and do not get filed away in the normal filing system. Because these random stacks of folders lacked any sort of organization, checking every piece of paperwork against every folder in everyone’s desk would be a massive undertaking. For a human anyway.

So I decided to make a computer do it for me.

I immediately started work on a simple database client which I called “Whose Desk Is It Anyway?” which would keep a list of all ID’s of the paperwork in the stack of paperwork I couldn’t find, and of all of the ID’s of the folders in peoples desks, and let me know which papers need to go to whose desk.

For the sake of simplicity, I used SQLite (Xerial) and stored the .db file on a network drive so my coworker and I could use the program simultaneously.

As we began using it, I began to flesh out the UX by adding convenience features.

I integrated the program with the companies old Lotus Approach database so we could verify information about the paperwork while we added it to our program.
I added import and export features so that a list of stock numbers generated in one of the other programs I wrote could be added or deleted from a location in this programs database easily.

This is by far one of the most personally useful projects I’ve ever done, making what I do at my job so much easier. It has been instrumental in finding a home for all the orphaned paperwork.

The SQL database uses a simple two column table, relating location (TABLE_NAME) to paperwork ID (VIN).
The program finds matches using a simple SQL query:


The results from the query are loaded into a HashMap<String, ArrayList<String>> and fed into my favorite snippet of code from this project.

I needed to sort the result set by first sorting each Array in the HashMap, and then sorting the HashMap based on the first element in each sorted Array. Thanks to Java 8, I managed to do it in 4 lines.

ArrayList<map.entry<string, arraylist="">&gt; results = new ArrayList&lt;&gt;(resultsMap.entrySet());
results.sort(Comparator.comparing(o -&gt; {
return o.getValue().get(0);

This code first converts the HashMap to an ArrayList of entries and uses the built in sort method with a generated Comparator which is configured to sort the inner Array (using the built in sort method, and a default Comparator from String) and return its first element for comparison. The performance impact from repeatedly sorting the Array was not noticeable, since the HashMap I am dealing with here usually only contains ~20 strings with about 2 to 4 elements in each Array.

Source Code

When I started working at VIP Traders my job was to organize paperwork into folders. I would eventually transition into writing software to automate large portions of mine and others jobs, and the first step in that transition was an Android app.

I noticed that my coworkers were having to walk around our parking lot writing down the VIN numbers of the cars on a piece of paper, and then typing them into a spreadsheet. If that isn’t a prime candidate for automation, then I don’t know what is. So I wrote a demo project at home, and presented it to the CEO. They agreed that it would be an improvement, and I was reassigned from filing to software development. As per their requested specs, the app scans a VIN barcode, and allows the user to submit all the scanned VINs to a new Google Sheets spreadsheet. Pretty nifty.

Point the camera at a bar code

The code is scanned and data is entered by the user

Data is submitted to a spreadsheet

While I was working on the app, I noticed another coworker importing photos from a camera, and renaming the files after the VIN, which he was typing in manually for each vehicle. So I extended the app to run on the companies Samsung Galaxy Cam 2’s and added a feature for capturing and storing photos.
Once a VIN has been scanned in, a button becomes available that leads to a page that prompts the user to take the photos they are supposed to take. The photos are saved to the camera and are named with a naming convention based off of the VIN.

This was a very fun project for me to develop as it really pushed the boundaries of abilities. I learned a ton about Android app development, and “invented” what I would later learn are called asynchronous callbacks. At the time, I called them “Post Execution Runnables”.

final CreateSpreadsheetTask task = new CreateSpreadsheetTask(sheetName.getText().toString(), context);
task.setPostExecuteRunnable(new Runnable() {
public void run() {
if (task.suceeded()) {
//Do stuff
} else {
Toast.makeText(context, "Could not create spreadsheet", Toast.LENGTH_LONG).show();

I thought this solution was very cool and was not at all surprised to find that its a not a new idea in the slightest.

Special thanks to bees4honey for providing the open source VIN scanning C library needed to scan in VIN barcodes.
Unfortunately, I cannot share a link to the GitHub repository for the project, as it contains sensitive information for the company.

Understanding this post requires at least a cursory understanding of what Reddit is. If you don’t know what Reddit is, I highly recommend this video by CGP Grey.

When I realized that almost every subreddit on Reddit has links to subreddits about similar topics contained in an easily accessible part of the page thanks to Reddit’s REST API, I knew I had to find a way to make a map of the connections between all of the subreddits on Reddit. Not because it would be useful, but because it would be cool.

So, I whipped up a web crawler in Java and used gson to parse the JSON responses. The related subreddits were added to a Queue and are retrieved in FIFO order. A Graph data structure of Nodes and Edges is maintained until the crawling is done, at which point it is exported to CSV format and imported into a program called Gephi, which can be used to build the following visualizations.

There are nearly a million subreddits, each one with an average of 10 connections, so I had to trim down the data set to a more manageable size. I chose to limit it to only subreddits with more than 1,000 subscribers. This leaves some subreddits stranded, with no links to them from the central cluster, and as such they form a sort of reddit “Oort cloud”. Nodes are subreddits, edges are links between subreddits, and node size is determined by number of subscribers to that subreddit. I ran an algorithm called OpenOrd to form the clusters, and used those clusters of subreddits with high mutual linkage to determine node color.

Then, I ran an expansion algorithm to spread out the densely packed clusters to make it easier to see what is going on.

Next, I hide the edges between nodes, again in the interest of clarity.

I did some poking around in Gephi to determine which category each cluster was comprised of, and labeled them in Photoshop. Probably the most interesting trend I found while poking around was that gay porn subreddits tended to link to LGBTQ support group subreddits, which in turn linked to self improvement subreddits, which explains the proximity of the porn cluster to the self improvement cluster.

In this image I enabled node labels for subreddits with more than 10,000 subscribers. All of the images in this post are high resolution (4000×4000) so if you open them in a new tab you’ll be able to zoom in very far and read the labels much easier.

The algorithm that generates these graphs is actually a sort of physics simulation, so watching it simulate the graph looks very cool. Below are a few gifs of the process in action. If they aren’t loading on your device or you would like to be able to zoom in on them, click “View full resolution”

View full resolution

View full resolution
I also had the scraper gather the creation date for the subreddits, and made an animation where the output was filtered by year, in order to display the growth of reddit over time.

View full resolution

View full resolution

The final addition to this project was a map of the power moderators of Reddit. Because of the extreme number of edges in this case, I limited the scope of the scraping and visualization to only moderators of the 49 default subreddits. Each moderator got a node, and each edge meant that those moderators shared a subreddit. The more subreddits two moderators share, the higher the edge weight. The more subscribers that moderator was in charge of, the large their node.

The 49 default subs have a total of 2627 moderators, with 2,673,294 connections between them. The top 10 moderators on Reddit are in charge of between 43 million to 200 million users each. Again, colored clusters represent high degrees of linkage. This means that that small group of moderators have all added each other to their respective subreddits.

As a followup to this project, I also downloaded Wikipedia’s SQL database and parsed through it to generate a similar data set. However, with over 5 million articles, each easily containing over 50 links, the data set was too large to be handled by Gephi. And I was unfortunately not able to come up with a satisfactory way of filtering down which articles would be included, and eventually lost interest in the project and moved on in favor of more interesting projects.

Source code is available here

When I first came across bots running on Markov Chains on the internet, I knew I had to make my own. I decided that a Twitter bot that generates and tweets Captains Logs as in the Star Trek franchise would be pretty cool. So I spent some time learning how to build the various components I would need. A web scraper for gathering the scripts, the code required to interface with Twitter, a MarkovChain class and the data structures and UI it would rely on.
The most interesting part of this project was MarkovChain, and of course the hilarious output. The account gained some traction when Wil Weaton, an actor on the show, followed the account and retweeted several of its posts.
First lets look at some examples, and then we’ll dive into how it works.

In general, a Markov Chain is a statistical model that relates a word (or character) to the probability of any other word (or character) occurring after it. The core data structure of the program used to accomplish this is a Hashtable<String, Vector<String>>.
Every line of the script is broken down and fed into the Hashtable. In order to increase coherence, the bot uses a chain length of 3. To achieve this, a line is broken down in the following manner:
Input line: “Captain’s Log, stardate 8654.5, this is an example”
Output strings: “Captain’s Log,”, “Log, stardate”, “stardate 8654.5”, “8654.5, this”, “this is”, “is an”, “an example”
That way, at a minimum, any 3 words in order in the output are guaranteed to have occurred at least once in the original scripts.
When these strings are fed into the Hashtable, the result is that any two words can be used as a key to retrieve a list of word pairs that have occurred after them. Note that because this is a Vector (basically just an Array) and not a Set, duplicates will exist. The number of duplicates of an entry affects the probability of selecting that path, thus satisfying the Markov property.

If “Captain’s Log,” is used as the starting word pair, it will select randomly from the list of word pairs that have occurred after that. For example, the Vector for that “seed” might be {“Log, stardate”, “Log, supplemental”}.
It is easy to see how this method of choosing the next word in the phrase based on how likely it is for that word to occur after the previous word in the source data set can lead to a coherent and humorous output.

I also built a GUI for customizing exactly how the scripts were parsed to allow myself to generate dialog from specific characters, or even scene change notes. I also experimented with a Treknobabble generator to generate phrases that used as few words from the top 10,000 most common words as possible, but the results were less than satisfying. Outputs usually just contained peoples names and other “uncommon” words, rather than containing unique Star Trek technical words.

To see more tweets, visit the bot’s Twitter Account
To see the code, visit the github page for the project