JSON Resume is an open source initiative to create a JSON-based standard for resumes.

I find the idea very interesting, and I used it to make my resume for this site. I think creating a standardized format for resumes is a worthy goal. I can envision job search sites storing and serving your resume data up to potential employers, which they would be able to view in a resume viewer program, or by integrating the data directly into their own system. No iffy PDF scraping required. No uploading your resume AND filling out a million form fields. The UX would be superior for both job seeks and job providers.

The raw JSON for my resume can be seen at this link or by sending an HTTP GET request to larsbenedetto.work/resume/api/

After organizing all of my data into JSON format, I found a program called HackMyResume to convert the JSON file into various usable formats, and styling them with JSONResume themes. I chose HTML and PDF formats for use on this site.

One thing that annoyed me to no end at my job at VIP Traders was the large stack of paperwork with apparently no corresponding folder to put it into. Then I discovered that hidden away in everyone’s desk is a pile of folders that have slipped through the cracks and do not get filed away in the normal filing system. Because these random stacks of folders lacked any sort of organization, checking every piece of paperwork against every folder in everyone’s desk would be a massive undertaking. For a human anyway.

So I decided to make a computer do it for me.

I immediately started work on a simple database client which I called “Whose Desk Is It Anyway?” which would keep a list of all ID’s of the paperwork in the stack of paperwork I couldn’t find, and of all of the ID’s of the folders in peoples desks, and let me know which papers need to go to whose desk.

For the sake of simplicity, I used SQLite (Xerial) and stored the .db file on a network drive so my coworker and I could use the program simultaneously.

As we began using it, I began to flesh out the UX by adding convenience features.

I integrated the program with the companies old Lotus Approach database so we could verify information about the paperwork while we added it to our program.
I added import and export features so that a list of stock numbers generated in one of the other programs I wrote could be added or deleted from a location in this programs database easily.

This is by far one of the most personally useful projects I’ve ever done, making what I do at my job so much easier. It has been instrumental in finding a home for all the orphaned paperwork.


The SQL database uses a simple two column table, relating location (TABLE_NAME) to paperwork ID (VIN).
The program finds matches using a simple SQL query:

SELECT VIN FROM FOLDERS GROUP BY VIN HAVING COUNT(TABLE_NAME) >= 2;

The results from the query are loaded into a HashMap<String, ArrayList<String>> and fed into my favorite snippet of code from this project.

I needed to sort the result set by first sorting each Array in the HashMap, and then sorting the HashMap based on the first element in each sorted Array. Thanks to Java 8, I managed to do it in 4 lines.

ArrayList<map.entry<string, arraylist="">&gt; results = new ArrayList&lt;&gt;(resultsMap.entrySet());
results.sort(Comparator.comparing(o -&gt; {
o.getValue().sort(String.CASE_INSENSITIVE_ORDER);
return o.getValue().get(0);
}));
</map.entry<string,>

This code first converts the HashMap to an ArrayList of entries and uses the built in sort method with a generated Comparator which is configured to sort the inner Array (using the built in sort method, and a default Comparator from String) and return its first element for comparison. The performance impact from repeatedly sorting the Array was not noticeable, since the HashMap I am dealing with here usually only contains ~20 strings with about 2 to 4 elements in each Array.

Source Code

When I started working at VIP Traders my job was to organize paperwork into folders. I would eventually transition into writing software to automate large portions of mine and others jobs, and the first step in that transition was an Android app.

I noticed that my coworkers were having to walk around our parking lot writing down the VIN numbers of the cars on a piece of paper, and then typing them into a spreadsheet. If that isn’t a prime candidate for automation, then I don’t know what is. So I wrote a demo project at home, and presented it to the CEO. They agreed that it would be an improvement, and I was reassigned from filing to software development. As per their requested specs, the app scans a VIN barcode, and allows the user to submit all the scanned VINs to a new Google Sheets spreadsheet. Pretty nifty.

Point the camera at a bar code

The code is scanned and data is entered by the user

Data is submitted to a spreadsheet

While I was working on the app, I noticed another coworker importing photos from a camera, and renaming the files after the VIN, which he was typing in manually for each vehicle. So I extended the app to run on the companies Samsung Galaxy Cam 2’s and added a feature for capturing and storing photos.
Once a VIN has been scanned in, a button becomes available that leads to a page that prompts the user to take the photos they are supposed to take. The photos are saved to the camera and are named with a naming convention based off of the VIN.

This was a very fun project for me to develop as it really pushed the boundaries of abilities. I learned a ton about Android app development, and “invented” what I would later learn are called asynchronous callbacks. At the time, I called them “Post Execution Runnables”.

final CreateSpreadsheetTask task = new CreateSpreadsheetTask(sheetName.getText().toString(), context);
task.setPostExecuteRunnable(new Runnable() {
@Override
public void run() {
if (task.suceeded()) {
//Do stuff
} else {
Toast.makeText(context, "Could not create spreadsheet", Toast.LENGTH_LONG).show();
}
}
});
task.execute();

I thought this solution was very cool and was not at all surprised to find that its a not a new idea in the slightest.

Special thanks to bees4honey for providing the open source VIN scanning C library needed to scan in VIN barcodes.
Unfortunately, I cannot share a link to the GitHub repository for the project, as it contains sensitive information for the company.

Understanding this post requires at least a cursory understanding of what Reddit is. If you don’t know what Reddit is, I highly recommend this video by CGP Grey.

When I realized that almost every subreddit on Reddit has links to subreddits about similar topics contained in an easily accessible part of the page thanks to Reddit’s REST API, I knew I had to find a way to make a map of the connections between all of the subreddits on Reddit. Not because it would be useful, but because it would be cool.

So, I whipped up a web crawler in Java and used gson to parse the JSON responses. The related subreddits were added to a Queue and are retrieved in FIFO order. A Graph data structure of Nodes and Edges is maintained until the crawling is done, at which point it is exported to CSV format and imported into a program called Gephi, which can be used to build the following visualizations.

There are nearly a million subreddits, each one with an average of 10 connections, so I had to trim down the data set to a more manageable size. I chose to limit it to only subreddits with more than 1,000 subscribers. This leaves some subreddits stranded, with no links to them from the central cluster, and as such they form a sort of reddit “Oort cloud”. Nodes are subreddits, edges are links between subreddits, and node size is determined by number of subscribers to that subreddit. I ran an algorithm called OpenOrd to form the clusters, and used those clusters of subreddits with high mutual linkage to determine node color.

Then, I ran an expansion algorithm to spread out the densely packed clusters to make it easier to see what is going on.

Next, I hide the edges between nodes, again in the interest of clarity.

I did some poking around in Gephi to determine which category each cluster was comprised of, and labeled them in Photoshop. Probably the most interesting trend I found while poking around was that gay porn subreddits tended to link to LGBTQ support group subreddits, which in turn linked to self improvement subreddits, which explains the proximity of the porn cluster to the self improvement cluster.

In this image I enabled node labels for subreddits with more than 10,000 subscribers. All of the images in this post are high resolution (4000×4000) so if you open them in a new tab you’ll be able to zoom in very far and read the labels much easier.

The algorithm that generates these graphs is actually a sort of physics simulation, so watching it simulate the graph looks very cool. Below are a few gifs of the process in action. If they aren’t loading on your device or you would like to be able to zoom in on them, click “View full resolution”

View full resolution

View full resolution
I also had the scraper gather the creation date for the subreddits, and made an animation where the output was filtered by year, in order to display the growth of reddit over time.

View full resolution

View full resolution

The final addition to this project was a map of the power moderators of Reddit. Because of the extreme number of edges in this case, I limited the scope of the scraping and visualization to only moderators of the 49 default subreddits. Each moderator got a node, and each edge meant that those moderators shared a subreddit. The more subreddits two moderators share, the higher the edge weight. The more subscribers that moderator was in charge of, the large their node.


The 49 default subs have a total of 2627 moderators, with 2,673,294 connections between them. The top 10 moderators on Reddit are in charge of between 43 million to 200 million users each. Again, colored clusters represent high degrees of linkage. This means that that small group of moderators have all added each other to their respective subreddits.

As a followup to this project, I also downloaded Wikipedia’s SQL database and parsed through it to generate a similar data set. However, with over 5 million articles, each easily containing over 50 links, the data set was too large to be handled by Gephi. And I was unfortunately not able to come up with a satisfactory way of filtering down which articles would be included, and eventually lost interest in the project and moved on in favor of more interesting projects.

Source code is available here

When I first came across bots running on Markov Chains on the internet, I knew I had to make my own. I decided that a Twitter bot that generates and tweets Captains Logs as in the Star Trek franchise would be pretty cool. So I spent some time learning how to build the various components I would need. A web scraper for gathering the scripts, the code required to interface with Twitter, a MarkovChain class and the data structures and UI it would rely on.
The most interesting part of this project was MarkovChain, and of course the hilarious output. The account gained some traction when Wil Weaton, an actor on the show, followed the account and retweeted several of its posts.
First lets look at some examples, and then we’ll dive into how it works.

In general, a Markov Chain is a statistical model that relates a word (or character) to the probability of any other word (or character) occurring after it. The core data structure of the program used to accomplish this is a Hashtable<String, Vector<String>>.
Every line of the script is broken down and fed into the Hashtable. In order to increase coherence, the bot uses a chain length of 3. To achieve this, a line is broken down in the following manner:
Input line: “Captain’s Log, stardate 8654.5, this is an example”
Output strings: “Captain’s Log,”, “Log, stardate”, “stardate 8654.5”, “8654.5, this”, “this is”, “is an”, “an example”
That way, at a minimum, any 3 words in order in the output are guaranteed to have occurred at least once in the original scripts.
When these strings are fed into the Hashtable, the result is that any two words can be used as a key to retrieve a list of word pairs that have occurred after them. Note that because this is a Vector (basically just an Array) and not a Set, duplicates will exist. The number of duplicates of an entry affects the probability of selecting that path, thus satisfying the Markov property.

If “Captain’s Log,” is used as the starting word pair, it will select randomly from the list of word pairs that have occurred after that. For example, the Vector for that “seed” might be {“Log, stardate”, “Log, supplemental”}.
It is easy to see how this method of choosing the next word in the phrase based on how likely it is for that word to occur after the previous word in the source data set can lead to a coherent and humorous output.

I also built a GUI for customizing exactly how the scripts were parsed to allow myself to generate dialog from specific characters, or even scene change notes. I also experimented with a Treknobabble generator to generate phrases that used as few words from the top 10,000 most common words as possible, but the results were less than satisfying. Outputs usually just contained peoples names and other “uncommon” words, rather than containing unique Star Trek technical words.

To see more tweets, visit the bot’s Twitter Account
To see the code, visit the github page for the project