Feedback

The circle-dot is a Hadamard product, which I have mentioned now under the formula table. S_H(v_x) is a count-min sketch for v_x. There is a huge paper(which I have referenced) introducing a count-min sketch. I tried to explain in short what is a count-min sketch in "Introduction". It is a randomized data structure to approximate v_x  with limited storage. This is optional as mentioned in the paper. The reason for this approximation is that a v_x vector has a dimension equal to the number of entities(which can be very huge for practical KBs) and thus this representation can be inefficient for some operations. So, they introduced count-min sketch S_H which kind of approximates v_x with fewer dimensions using hash functions. The entity set in EmQL is similar to Query2Box, where a_x defines the weighted centroid of the set X that identifies the general region containing elements of set X. Query2Box uses the box, while EmQL uses region around centroid to encode sets of entities. A major drawback of their system is that for this kind of system to work, they need entities that can appear together in a set to have entities vectors closer to each other. So, entities vector needs to be pre-trained keeping this in mind.

I hope this clears some doubts.

Also, I will try to edit the article so that it is more intuitive.

MAULIKMAHESHBHAIPARMAR (talk)07:02, 23 March 2021

Also, the paper has not mentioned how do they create an initial v_x vector for sets. How are the weights for sets decided is also not mentioned anywhere(even not in supplements).

I have created an Ambiguity section at the end of the article, for questions that the paper missed clarifying.

MAULIKMAHESHBHAIPARMAR (talk)07:25, 23 March 2021