Algorithm for Validation of Reference File with Wildcards

Samik :

I have a file like below, which I want to validate for correctness. The file is used as a reference file for processing some data. I match my input data with ColA, ColB and ColC of this file and return OutA of the first match from top. The wildcards '*' match anything. For example, if my input data has X4 Y2 Z3 it will return 13 from the file.

Seq  ColA  ColB  ColC  OutA
1    X1    Y1    Z1    10
2    X2    Y2    *     11
3    X3    *     Z2    12
4    *     Y2    Z3    13
5    *     *     Z4    14
6    *     Y3    Z4    15
7    *     *     *     16

Now the file can have some entries that are never used or reachable. For example, if I receive X9 Y3 Z4 as my input, it will match with row 5, and will never look at row 6 although row 6 also matches my input. If we exchange the position of row 5 and row 6, it will work as expected. I want to find such unreachable records before my actual process runs.

Any idea on how to find such entries in the file. I am looking for an algorithm. Note that, I have reduced the number of columns and rows in this example. The actual file has around 10 columns and 50 rows.

David Eisenstat :

Assuming that wildcards match every string (specifically, for each column, there exists a valid symbol that does not appear as a literal), it suffices to check each pair of rows to see whether the first matches a superset of what the second matches. This is the case if and only if, for each column, if the second row has a literal, and then first row has the same literal or a wildcard, and if the second row has a wildcard, then the first row has a wildcard.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=92134&siteId=1