Authors: Billy Cromb, Tyler House
What is the problem?
The American Election is fast-moving and it's hard to keep up with the news. Often you might hear about something one of the candidates said or claimed or did but don't know how to find whether it's true or how to put it into context. Organizations like NPR and poltifact try to provide people with some way of telling whether what a politician said is true or false, but it can be hard to find a given fact or topic unless you exact phrasing of the statement in question. Politicheck is a natural language interface (NLI) system that matches a list of input keywords to the topics and claims in its knowledge base and provides information on such political statements that match the input keywords.
What is something extra?
Politicheck searches its knowledge base using its NLI and returns such information about a statement like the exact quote of a statement, the person who made the statement, the independently-verified veracity of the statement, a link to the news source of this independent verification, and a brief summary of the statement. The system also allows a user to find statements that match all input keywords, or statements that match any of the input keywords. This allows them to narrow their search as they would like, and therefore users can put their questions and curiosities into context. In class when we looked at NLIs, we were focused on making queries about data that was stored as relations. We’ve inverted this kind of system. We can make structured queries, but most of the information about claims stored in our knowledge base are entered as natural language. This means that adding a new claim could, in theory, be as easy as entering a short english summary of the actual quote/statement in question. Most of the relations that this claim has are then inferred by Politicheck as it processes the sentence when a query is made.
What did we learn from this?
To process top-level queries using the predicate
politicheck(LTS, PS, M, S, Q, L)., a user first provide a list of strings that are parsed to match keywords. In class, the vocabulary used in the natural language trees we developed was fairly simple. They provided interfaces broke each sentence down into a noun phrase and a verb phrase, which were then broken down into elements like determiners, nouns, adjectives, and modifying phrases. Parsing the sentence structure of a man like Donald Trump is actually a fairly complicated task for a NLI. The words he uses are deceptively simple, yet the structure of the sentence can be difficult to parse as he frequently repeats words and uses a lot of run-on sentences. Also, sometimes he uses words such as “bigly” and “braggadocious” that are not actually even words, and therefore cannot be correctly categorized. As such, or our natural language processing we had to include finer distinctions such as specifying that a modifying phrase can be a preposition phrase, and then we had to include many facts that stated words that are prepositions. Another distinction we made was that any word could be considered a noun. While this is obviously not the case, it was necessary given the lexicon of Donald Trump. Together, this allowed us to match our input keyword strings to the statements entered in our knowledge base as claims.
All that we are really doing to process sentences in these claims at this point is pulling keywords, this is something that a simpler algorithm could do, but because we are breaking down the sentences into parts, it would be relatively easy modify this system to extract more complex information about a sentence, given more time to implement it and a deeper knowledge of NLP. We had originally planned to do much more ambitious processing, but it turned out to be quite difficult so we scaled back our plans to fit the available time.
Another fun aspect of this project was using the Unique Name Assumption (UNA). We implemented the UNA such that several different strings corresponded to a single individual, and then this individual was able to be considered a “Topic” of a statement. To do this, we created the person_phrase predicate, where a difference list was used to find the different strings that map to an individual within our NLI.
Below are examples of such person_phrases:
person_phrase([rbg|R], R, rbg).
person_phrase([ruth, bader, ginsberg|R], R, rbg).
person_phrase([ginsberg|R], R, rbg).
person_phrase([the, notorious, rbg|R], R, rbg).
Another thing that we learned was that for NLIs, including some sort of prolog-compatible dictionary library would be a massive asset. Every time a claim is added to the knowledge base, corresponding facts had to be added to allow prolog to parse the summaries using its NLI. For example, adding the claim that corresponds to the quote “I don’t know Putin.” necessitated adding “know” and “do” as verbs, “putin” as a person phrase, and “russia” and all identified synonyms as a keyword. Elements corresponding to keywords and topics should be added by the system administrator. Using a dictionary to categorize words as grammatical constructs such as verbs, nouns, pronouns, adjectives, and so on would greatly improve the user experience from the point of view of the system administrator.
Here is an example query that looks for any statements relating to "the notorious rbg" and the result found by our program:
?- politicheck(["the notorious rbg"], trump, any, S, Q, L).
S = true,
Q = "Something happened recently where Justice Ginsburg made some very, very inappropriate statements toward me and toward a tremendous number of people, many many millions of people that I represent. And she was forced to apologize.",
All in all, we believe that this proof of concept project has been successful. We have shown natural language processing and logic programming can be used within the domain of processing queries related to political statements. While there is plenty of room to improve this project such as using adding dictionary support, using logic programming to perform queries on political statements is certainly possible.