Course:CPSC312-2016-Project1-TransitPlanner

Transit Planner

Authors: Dylan Symons, Dafang Cao

What is the problem?

The Greater Vancouver has many buses and bus stops maintained by TransLink. Navigating the transit system can be difficult. Many questions can be asked given a transit system, some of which are:

   "Which buses stop at stop A?"
   "Which buses go from stop A to stop B?"
   "Which stops can be reached from stop A within 4 stops?"
   "How can one go from stop A to stop B?"

We aim to find out if Prolog would be a suitable tool for creating a system to answer such queries as these.

The data used as program's input will come from https://developer.translink.ca/ServicesGtfs/GtfsData in GTFS format.

What is the something extra?

In addition to answering queries to find bus routes and stops, our system will aim to answer queries that requires factoring in route transfers.

What did we learn from doing this?

Prolog is a viable option for the natural language processing component. For the simple, narrow queries involved in this program it works quite well. There is a limited set of types of answer to look for (trip, bus, route, stop, etc.) and data for most are provided concretely. Similarly, there are a limited number of specifics to query on (specific stops, buses, etc.) and that data is provided concretely. Although our system can't handle all of the possible queries, with more time and resources the simple parsing and interpretation it does could be extended to be much more thorough.

Prolog provides a fairly simple and straightforward framework for creating this TransitPlanner system. Simple queries such as which buses stop at a given stop, which stop is closest to a given latitude and longitude. Our system is also able to find bus rides from one station to another, within certain time constraints. More advanced queries should also be answerable with additional work on the system. The queries are answered on the order of seconds, but without another system to compare with we can't say whether this is fast, slow, or average.

One problem we ran into a few times was the limitation of the stack size. The recursive nature of Prolog required the use of a considerable amount of stack space when looking through the data. Specifically, the large number of stop times was prohibitive at times. A more advanced system that answers queries involving multiple stop times (routes between multiple stops) would require exponentially more storage. Although a larger software system would likely have more hardware resources to draw from than we used for this project, there would be alternatives that don't require as much stack space per query. For this reason, a system that stores and accesses the data in a more efficient manner would be preferable over Prolog.

Prolog may be undesirable for use as a database. Having to load the whole knowledge base into memory imposes a slow start-up time as well as non-trivial memory usage. About 90 seconds are required to parse and load 90MB of data into memory, and the memory usage after loading was a staggering 460MB. This can be more problematic if knowledge base is too big for storage in memory.

That being said, Prolog offers powerful meta-programming capability. Our implementation took full advantage of being able of generate facts and add them to knowledge base at run time. Our program uses GTFS files as input. At the beginning of the program, GTFS files consisting of CSV(comma seperated value) are parsed. Each row of data is asserted as fact, and each column is asserted as rules for accessing data. The dynamic nature of knowledge base creation allows for certain flexibility in the input data format. For example, our implementation does not assume a certain column order in the program input. It was also experimented to parse input file into prolog source code, before launching. While doing so speeds up the start-up time significantly (50 seconds), memory usage remained unaffected.

Overall, our study has revealed that Prolog would be able to answer the kinds of queries we want, but is more costly in terms of storage and possibly time than other frameworks would be.