Course talk:CPSC532:StaRAI2020:Query2Box and Faithful Embeddings for Knowledge Base Queries.
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
Feedback | 2 | 08:00, 23 March 2021 |
Aid understanding | 1 | 06:35, 23 March 2021 |
Question | 1 | 05:16, 22 March 2021 |
I found it difficult to understand EmQL from the "Representation" section. The Query2Box paragraph was easy to understand, but the EmQL paragraph was impenetrable for me. E.g., I don't know what an "entity set" is! Is it a set of a fixed number of entities (e.g., the set of 6 students in CPSC 532) or the set of all graduate students (which keeps changing)? Why does that formula for a_X give a weighted centroid? S_H(V_x) is not defined. Don't use notation you haven't defined. What is circle-dot? I would also like to see a clearer explanation of relation following in EmQL. Can it be written in a more intuitive way like Query2Box?
The circle-dot is a Hadamard product, which I have mentioned now under the formula table. S_H(v_x) is a count-min sketch for v_x. There is a huge paper(which I have referenced) introducing a count-min sketch. I tried to explain in short what is a count-min sketch in "Introduction". It is a randomized data structure to approximate v_x with limited storage. This is optional as mentioned in the paper. The reason for this approximation is that a v_x vector has a dimension equal to the number of entities(which can be very huge for practical KBs) and thus this representation can be inefficient for some operations. So, they introduced count-min sketch S_H which kind of approximates v_x with fewer dimensions using hash functions. The entity set in EmQL is similar to Query2Box, where a_x defines the weighted centroid of the set X that identifies the general region containing elements of set X. Query2Box uses the box, while EmQL uses region around centroid to encode sets of entities. A major drawback of their system is that for this kind of system to work, they need entities that can appear together in a set to have entities vectors closer to each other. So, entities vector needs to be pre-trained keeping this in mind.
I hope this clears some doubts.
Also, I will try to edit the article so that it is more intuitive.
I think the main issue of "the problem of faithfulness to deductive reasoning" is not explained in the page, can you explain what is meant my faithfulness to deductive reasoning, maybe using laymens terms or with some example ?
I'm trying to visualize the regions defined by EmQL. Quick question: are those regions convex?
Hi Lucca, here the problem is that they represent a set of entities using a weighted sum of entity vectors. So, they have similar logic to point-based embedding. You could say a point is a trivial convex set.
Also, when computing answers to a query like relation following, the top-k similar triplet embedding is generally the answer. But as k is the same for every query, it is not necessary this region will be convex(unlike in threshold-based system where this region will be a convex sphere with the center as centroid and radius equal to the threshold).