Course:CPSC532:StaRAI2020:TableDetection

From UBC Wiki

Table Recognition in Documents

We explore two methods for table detection in documents.

Authors: Lucca Siaudzionis

Abstract

We explore the problem of table detection in documents. That is, given a PDF document (potentially a scanned one), the task is to detect the location of tables and to know what are the contents inside them. Two methods are compared. The first is presented by Schreiber et al. in "DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images" [1]; the second is presented by Qasim et al. in "Rethinking Table Recognition using Graph Neural Networks" [2].

Related Pages

The CPSC 532P page on Graph Neural Networks.

Content

The work by Schreiber et al. and Qasim et al. both attempt to tackle the problems of detecting and parsing tables in documents. A key aspect that both have in common is that they do not use any of the PDF metadata, effectively treating the document as images. A natural, positive consequence of this is that it also allows for their methods to work on scanned documents, as well as digital-born ones.

DeepDeSRT by Schreiber et al.

From the way the task is defined, there are two parts that naturally arise:

  1. Detect the table; and
  2. Parse the elements and structure of the table, figuring out its rows, columns, and cell values.

Schreiber et al. used the clever insight that the problem of table detection is conceptually similar to object-detection. Thus, they use well-known solutions to solve that problem. Due to the lack of available training data for table detection, they used transfer learning to get a better performance on this task.

The second problem, to detect structure while parsing the table, is quite different from the first. As pointed out by the authors, a difficulty arises from the fact that there are significantly more rows and columns in a table than tables in a document. Hence, they use a different approach. They use deep learning-based semantic segmentation tools to address this task, with object detection methods on document images that were stretched vertically.

Their system, DeepDeSRT, is a combination of these two solutions.

GNNs by Qasim et al.

Similarly to Schreiber et al., the authors point out that the lack of available data as a major problem. The key difference is that, this time around, the authors do something about it by generating a synthetic dataset that is capable of generating tables in four different formats. This dataset is helpful for their training.

Qasim et al. did not radically reimagine a solution for the problem. They maintained the same structure of having two steps: table detection and structure recognition. However, they focus almost entirely on the latter problem.

A key benefit of using GNNs for table detection is that it takes advantage of language features which can point out the possibility of a table existing in the document. The authors claim to be the first ones to use GNNs to solve this problem.

The architecture used for parsing structure is vastly different. They no longer treat it as a problem that is born out of image detection. Instead, they build a graph in which each word becomes a node, and then infer structure from it. However, to take advantage of high performing CNN and OCR-based solutions, they also incorporate those in their structure before creating their graph.

Figure 2 in the second page of their paper shows their architecture. (Image not added here due to copyright issues.)

For evaluation, they consider 4 categories of tables (Figure 1, page 2 in their paper):

  1. "Normal" table with rows and column lines clearly draw, and no cells merged;
  2. Table with no cells merged, but rows and column lines are not drawn;
  3. Mixed table with merged cells, columns, or rows; and
  4. Captured images from scanned documents that could be distorted.

Their model beats the baseline by a significant margin in every category, and almost in every metric. (Tables I and II on page 5). The model achieves superb performance on Tables of categories 1 and 2, performs well on category 4, and has significant difficulties on category 3. Thus, they conclude that their GNNs struggle with merged rows/columns, but handle distortions in the images really well.

Annotated Bibliography

[1] S. Schreiber, S. Agne, I. Wolf, A. Dengel, S. Ahmed, "DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images"

[2] S. R. Qasim, H. Mahmood, F. Shafait, "Rethinking Table Recognition using Graph Neural Networks"

To Add

Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.

Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0