To Do:

  • Improve the description generation code, to remove markdown symbols, and extra whitespace.
  • Fix relative link issue, where relative links that aren't on Github break: pahen/madge
  • Lots of messed up version parsing in xvik/generator-lib-java
  • Should be able to get date from angular/angular-cli
  • Breakout and background the AWS CloudSearch document upload technique to properly upload batches of documents rather than one document at a time, so we don't hit the 10,000 upload document API req per day limit.
  • Fix invalid versions in mapbox/node-sqlite3
  • European date format DD/MM/YYYY not being parsed correctly in stamplay/stamplay-nodejs-sdk
  • Probably need to sanitize the form element that is being rendered in the HTML of selz/plyr. Also better look for other potential XSS or injection attacks that aren't being handled by existing markdown sanitization. Update: also found ljharb/qs where an HTML tag is breaking rendering.
  • Write detailed blogpost about the serverless architecture to supplement architectural diagram. Show multiple versions of the architecture diagram as the project progressed for comparison along with rational for changes as I went.
  • Fix issue where suffixes on version numbers aren't included: shelljs/shelljs
  • Possibly related, but beta suffixes on version numbers seem to stop the date from being parsed in ryansobol/eslint-config-ryansobol and selsamman/amorphic-bindster
  • Use Github API data to follow forks back to the original Github repo so we have one HTML page for each repo, instead of accumulating a page for each fork as well.
  • Look into stylizing changelog tags such as "breaking", "security" with bootstrap labels to make them stand out more. Also tag releases with the same tags. Potentially also detect links to and use them to auto tag the release as security related.
  • Look into unbolding changelogs that use excessive bold. tlecancode/pkqd
  • Use Github API to fetch and attach some extra metadata such as number of stars and forks, as this can be used to highlight popular projects that have had recent updates.
  • Proper fuzzy search, with suggestions for the search box on the homepage, perhaps powered by Elastic Search?
  • Proper 404 page, and proper 404 page for changelog not found in repo.
  • Support for the "Unreleased" version proposed by keepachangelog
  • Provide a readme badge service, for projects to potentially link back to changelogs on this site?


These may or may not happen:

  • Consider sponsoring cheeriojs to get a link back to from the cheeriojs readme and website. Since it is a worthy project and a piece of technology that this crawler uses it would be fitting, and a good promotion.
  • Detect changelogs which link to a github diff, instead of providing a human readable explanation. For example version 1.6.4 in this changelog. Do something better about this.
  • Possibly make the parser able to parse release dates that are one line below the version number, like in pegjs/pegjs
  • Automatically open issues on repos that have no timestamps on their changelog versions, with a link to this site and to an ideal changelog style guide. Potentially spammy? Also must have very high accuracy on date parsing and few missed date parses first


Sepetember 2016:

  • Remembering the last version number and version date metadata for detecting updates between crawls. Also adding the version number to the homepage feed of recently crawled repos.
  • Started caching the URL of the changelog file to save on the number of Github API calls to list the files in the root of the repo. This will be friendlier to their servers, and lessen the risk of hitting a rate limit.
  • Added regex match for dates that have no comma, in brunch/less-brunch
  • Added "changelog.mkd" as a changelog filename for tomnomnom/gron
  • Added search box to the homepage with instant suggestions as you type
  • Setup the basics of indexing the names of discovered repos in AWS CloudSearch, for proper search functionality.
  • Fixed case of not detecting underlines formed by dashes in pegjs/pegjs
  • Fixed scrambled changelog due to extra markup in the headers archiverjs/node-archiver
  • Updated architecture diagram, retained old one for comparison.
  • Fixed version number in the body being parsed because github issue number is being seen as a date: makojs/cssnext
  • Automatically generating a sitemap so Google and other bots can discover the changelog pages better. Also added robots.txt and humans.txt
  • Dynamic title, meta, and open graph tags for more love from search engines and social media.
  • Also parsing as a changelog, for newrelic/node-newrelic
  • Deduped and rate limit on the changelog discovery side, rather than on the crawler side, to reduce the number of Kinesis requests that get made.
  • Persisted metadata on all discovered repos forever. (Current model has a rolling 24 hour retention window).
  • Retained rendered HTML pages forever so that deeplinks to site are always served quickly without hitting a JIT render, but now trigger asynchronous background regeneration of these HTML pages on a schedule so they stay up to date.

August 2016:

  • Added last crawled timestamp to the changelog API response
  • Fixed missed last line on itsasbreuk/itsa-react-editor and veams/veams-cli
  • Fixed this koajs/mount
  • Initial rerelease of the JSON API as a serverless, lambda powered application, with a human friendly HTML website now.