Course:GQL2SQL

Section:
Instructor:
Email:
Office:
Office Hours:
Class Schedule:
Classroom:
Important Course Pages
	Syllabus
	Lecture Notes
	Assignments
	Course Discussion
	[[Category:]]

GQL2SQL

Authors: Greg, Aaron

What is the problem?

GraphQL is an API specification designed as a declarative replacement for REST. Web developers today have to write a both a GraphQL schema and a database schema to enable GraphQL queries to be made against the database data. We plan to eliminate this second step so that developers only need to write a GraphQL schema, and they get a database schema for free.

To accomplish this challenge, we will build on top of the work done by the Hasura Haskell GraphQL parser (https://hackage.haskell.org/package/graphql-parser-0.1.0.1/candidate), using our own Haskell code to output SQL commands that specify a database schema in PostgreSQL. Our main question for this assignment: does the functional paradigm lend itself well to program transformations and code generation?

What is the something extra?

In addition to answering the question above, we will create a working application which take a .graphql file as an input, parses it using the Hasura Haskell GraphQL, traverse the Abstract Syntax Tree data structure outputted by the parser, and outputs valid PostgreSQL schema code that corresponds - as best as possible - to the GraphQL Schema. The basic design is as follows:

If a GraphQL Object Type ends with the name "_ID", it is represented as a composite primary key for another table.
All other GraphQL Object Types are generated as tables.
All primitive fields of a GraphQL Object Type are generated as columns to tables.
Lists of primitive fields are ignored (since this implies their schema is not in first normal form).
Atomic non-primitive fields of a GraphQL Object Type are represented by a foreign key constraint on the table using the primary key(s) of the referenced table.
Lists of non-primitive fields, references by both Object Types, are represented by the addition of a Many-To-Many table containing only the primary keys of the referenced tables.
All other GraphQL input types are ignored.

What did we learn from doing this?

Generating a relational database schema from GraphQL is not only possible, but surprisingly straightforward. We found that there is a nice one-to-one correspondence between GraphQL types and SQL tables, and that one-to-one, one-to-many, and many-to-many relationships can be derived straightforwardly from using GraphQLs type system as a guide, as well as using the algorithm outlined in the above section.

One problem we discovered with translating GraphQL to SQL Schema was how to interpret primary keys - that is, the main piece of data that uniquely identifies an entity. We decided to use a convention that the first field of a GraphQL schema must be mandatory (denoted using the "!" syntax in GraphQL), otherwise we would not generate a table for that type.

To answer the question we set out to resolve: the functional paradigm - and Haskell in particular - lends itself well to program transformation and code generation. One reason for this is that Haskell's strong type reduced the amount of bugs that could have been introduced when traversing the Abstract Syntax Tree. Another reason comes about in the kinds of algorithms used in the functional vs. object oriented style:

In the object oriented style, Abstract Syntax Trees are best traversed using the Visitor Pattern which utilizes double-dispatch. This has the benefit of putting all complex logic in a single file, but has the drawback of requiring additional knowledge of design patterns to understand how the code is executed.
In the functional style, the code becomes much more straightforward, as one can directly traverses the abstract syntax tree using recursive descent, which makes the code more readable since it is easy for a notice to understand how the traversal logic executes at runtime.

However, one downside we noticed with the functional paradigm is that using Symbol Tables - a data structure that stores nodes of an Abstract Syntax Tree for later use - becomes much more painful to use when compared to its object oriented counterpart because:

The symbol table needs to be passed around to every function in the functional style - even if the function only uses it to pass down to its child node. This adds additional boilerplate code to every visit function that the object oriented style avoids by allowing methods to have access to class fields that do not need to be passed as parameters.
Adding elements to the symbol table requires additional complex state threading logic that flows throughout the functional code. This not only makes the code harder to read and understand since a reader has to be familiar with state threading patterns in a functional paradigm to understand why tuples of (result, state) are repeatedly passed throughout the code. In the Object Oriented approach, the user only needs to understand state an abstract level (e.g. that a call to table.addElement(element) will modify the state of the existing table), and not have to worry too much about how the state gets threaded.

Our summary is that the functional style is ultimately makes program transformation easier to reason about, and Haskell's type system likely decreased the number of bugs that would have been introduced if implemented in another language. However, it comes with a tradeoff of additional boilerplate required to get the symbol table logic working effectively. In the future, we would likely choose a multi-paradigm language to solve this problem, that way we can get the elegance and understandability of the functional approach along with the easy of use of class fields and (simple) state that the object oriented approach provides.

Links to code etc

Link to Github Code: https://github.com/aaronVerones/gql2sql

Build using stack build

Run using stack exec sql2gql-exe <path to graphql file>