Transitland Version 2

by Drew Dara-Abrams and Ian Rees

It’s been a quiet year on this blog, and we’re overdue to share all the work that’s been going on behind the scenes since the last post. What first started as two simple, separate, little experiments has grown into the seeds of a major revision of the entire Transitland platform: Transitland Version 2.

In this post we’ll step through:

After reading (or at least skimming :) this post, we welcome you to complete a survey about Transitland v2 and join the list to beta test.

Tlv2 Functionality and Goals

Tlv2 (as we’ve come to call this new version) provides continuity on the most used parts of the Transitland platform. Tlv2 will still provide a canonical service at transit.land that includes a Feed Registry of open GTFS data feeds from around the world, a queryable API, and a range of user interfaces. Tlv2, like Tlv1, will be powered by open-source components. Tlv2’s API will continue to be accessible without charge, and most of the Tlv1 API will continue to be available during the transition. (As before, we will enforce rate limits to ensure no one user hogs too many server resources; more on how rate limits will change below.)

What we are changing in Tlv2 is in support of three goals:

  • Higher Performance
  • More Modular Architecture
  • Distributed Deployments

Goal #1: Higher Performance

Tlv1 successfully parses and imports GTFS feeds from thousands of transit agencies. However its “FeedEater” component slows down or halts on a handful of the very larger GTFS feeds — typically single feeds that cover entire countries. This has always been a constraint on Tlv1.

Interline’s new Gotransit library solves this problem. Gotransit is a new transit data library written in the Go programming language, which provides finer-grained control of memory usage, as well as much easier patterns for concurrency and parallelization of complex tasks. Gotransit replaces replaces FeedEater, and will be used for feed validation, pre-processing, and importing data into the Tlv2 database. Gotransit is efficient enough to handle the largest feeds in a small memory footprint, and is fast enough to encourage interactive use; imports that took hours with FeedEater are completed in minutes by Gotransit.

Gotransit began as an experiment by Ian Rees over the last winter holidays, and it now complete enough to power multiple Interline projects, from the San Francisco Bay Area Regional GTFS Feed (which merges ~30 feeds into one) to a transit-to-routing-engine pipeline that Interline built for another client (which covers transit service for an entire country!). We look forward to releasing Gotransit under an open-source license shortly.

Transitland’s most demanding API users have also hit performance limitations over the years. As part of Tlv2, we’ll be providing bulk data dumps that give users another option. Use the Tlv2 API to explore data or power a small web app; use Tlv2 data extracts to power a large analysis, a giant web app, or a country-wide routing engine.

Goal #2: More Modular Architecture

Tlv1 has succeeded in part due to its “monolith” architecture. FeedEater, the API, and administrative operations all live within the same Transitland Datastore codebase. This has allowed for efficient reuse of data models, helpers, and test code. The single Datastore codebase is also straightforward to deploy and configure for the canonical transit.land site. The Datastore codebase can be reused — some have in fact redeployed it for their own regional or internal GTFS management needs—but customizing and redeploying the Datastore is not straightforward.

For Tlv2, we’re breaking the Transitland Datastore into modular components. Some components will be in Go (data processing), some will remain in Ruby (general back-end) and JavaScript (user interfaces). The canonical transit.land deployment will continue to use all. Other deployments by Interline, our partners, or open-source users may only use a subset. This more modular architecture will also allow deployments that connect Tlv2 components to internal data pipelines, routing engines, and other tools that may not be relevant to the canonical transit.land deployment.

While the software components of Tlv2 will be more modular, they will still be glued together using the data schemes developed for Transitland, including Onestop IDs and a common feed format spec (described below). Additionally, we will be sharing our database schema in versions appropriate for PostgreSQL and SQLite to help simplify additional application development, including outside of the canonical transit.land.

Goal #3: Distributed Deployments

The Feed Registry at transit.land will continue to be our canonical list of GTFS feeds. However, some deployments of Tlv2 components may want to use just a subset of the Feed Registry or a private list of feeds. We also want to open up the canonical Feed Registry for more collaborative tending, beyond just staff from Interline and Trillium Transit.

The second of the two experiments that has led to Tlv2 is called the Distributed Mobility Feed Registry. DMFR (for short) is a standard way of representing a list of GTFS feeds as one or more JSON files. It also currently supports GTFS-RT, and we expect it to extend to GBFS (bike rental) and MDS (e-scooter rental) in the future.

(Some history: This experiment actually started way back in the early tests of Transitland, before the fully built out Tlv1 monolith. It’s still archived on GitHub. Now we’ve come full circle to revisit and build out this approach.)

Soon the transitland/distributed-mobility-feed-registry repository on GitHub will become the source of truth for the Transitland Feed Registry of public open data feeds. We’ll accept pull requests against the JSON files to add or edit feeds. This should substantially lower the burden for outside contributors to contribute new feeds and help manage existing records. However, this will be just be one repository of DMFR files. In effect we’re blowing up the single Transitland Feed Registry and inviting the creation of many.

Gotransit can manage a datastore using one or more DMFR files, meaning that the canonical transit.land, distributed instances of Tlv2, and temporary interactive sessions with Gotransit on the command line can all share the same representation for mobility data feeds. At Interline, we’ve gone one step further and also use DMFR files to configure some of our routing engines and any other workflow that takes two or more mobility data feeds as input. We welcome other consumers or transformers of GTFS, GTFS-RT, and related data to try a DMFR file for their own needs.

Tlv2 Architecture Diagram

Now that we’ve introduced and discussed the new components of Tlv2, here’s a diagram of how they connect:

Tlv2 architecture diagram

Gradual Deprecation of Tlv1

Tlv1 requires a fair amount of attention and resources to maintain. While developing Tlv2, we’ve already had to split our attention. As we shift our focus to deploying and supporting Tlv2, we’ll be gradually deprecating Tlv1. Don’t worry — it won’t be too quickly.

Step 1: Tlv2 API Preview The initial release of Tlv2 will include a preview of a new GraphQL API that provides flexible, structured access to transit data. All new data will be available through this API from the start. API access will require a key. Sign-up will be free. We will institute tight rate limits at first and increase them over time, as we better understand the type and complexity of queries that users run.

Step 2: Deprecate Tlv1 Schedule API Immediately upon Tlv2 release, we will stop importing new schedule data into Tlv1. The ScheduleStopPair data model is the most expensive part of Tlv1 to run and maintain. We’ll first stop imports, keeping the API available against the last imported version. We’ll share tutorials on how to switch to the Tlv2 GraphQL API to access schedules. Then, after sufficient notice, we’ll empty old Tlv1 schedule data and remove that API endpoint.

Step 3: Other Tlv1 Entity API Endpoints While the Tlv2 GraphQL API will make a number of complex queries much simpler than with the current Tlv1 API, it may take a while for us to make all Tlv1 query options available in the Tlv2 API. During this time, we’ll try to continue to power the Tlv1 APIs (with the exception of the schedules endpoint). We’ll recommend that you build any new applications on the Tlv2 API, in case we do have to announce any specific incompatibilities during the Tlv1-to-Tlv2 API transition.

We’ll post updates to this site. Or watch @transitland and @interline_io for updates on Twitter.

How You Can Help

Here are ways you can help with the rollout of Tlv2:

  • Provide your input on Tlv2 by taking our survey and signing up for the beta tester list.
  • Comment on the DMFR GitHub repository. Contribute edits once we migrate over the Feed Registry to DMFR files.
  • If you work for a GTFS producing transit operator or mobility provider, contact Interline to learn more about “Tlv2 on premise” and regional GTFS feeds. It’s consulting projects like this that help us serve both mobility providers and further the development of Gotransit.
  • Contribute financially to the upkeep and growth of Tlv2 through the Linux Foundation’s CommunityBridge donation page for Transitland. These contributions go directly toward upkeep of the canonical transit.land.