PatternCounter#
Features#
This tool allows to count patterns in lists of sequential groups using rules and variables.
For the following examples, consider the following file (sequences.txt
):
A -1 -2
B -1 -2
A B -1 -2
A -1 B C -1 -2
B -1 A B -1 A -1 C -1 -2
Example 1: Count how many sequences contain both the elements A and B:
$ patterncounter count "A B" -n -f sequences.txt
Supp((A B)) = 0.6 | 3 lines: 2, 3, 4
Example 2: Count how many sequences contain elements A and B at the same group:
$ patterncounter count "A & B" -n -f sequences.txt
Supp(A & B) = 0.4 | 2 lines: 2, 4
Example 3: Count how many sequences have an element B that after after A:
$ patterncounter count "A -> B" -n -f sequences.txt
Supp(A -> B) = 0.2 | 1 lines: 3
Example 4: Count in how many sequences the element B was removed within an interval of A:
$ patterncounter count "[A OutB]" -n -f sequences.txt
Supp([A OutB]) = 0.2 | 1 lines: 4
Example 5: Count in how many sequences there is an element C that occurs after an interval of A:
$ patterncounter count "[A] -> C" -n -f sequences.txt
Supp([A] -> C) = 0.4 | 2 lines: 3, 4
Example 6: Show results even when the pattern does not exist:
$ patterncounter count "Z" -n -f sequences.txt -z
Supp(Z) = 0.0 | 0 lines
In addition to using simple rules, it is possible to define multiple rules and calculated association rules metrics among them:
Example 7: Count both how many sequences have an interval of A, and how many sequences have an interval of A with an element B inside:
$ patterncounter count "[A]" "[A B]" -n -f sequences.txt
Supp([A], [A B]) = 0.4 | 2 lines: 2, 4
Association Rule: [A] ==> [A B]
Supp([A]) = 0.8 | 4 lines: 0, 2, 3, 4
Supp([A B]) = 0.4 | 2 lines: 2, 4
Conf = 0.5
Lift = 1.25
Association Rule: [A B] ==> [A]
Supp([A B]) = 0.4 | 2 lines: 2, 4
Supp([A]) = 0.8 | 4 lines: 0, 2, 3, 4
Conf = 1.0
Lift = 1.25
It is also possible to define variables.
Example 8: Count how many sequences have groups with two distinct elements:
$ patterncounter count "x & y" -v "x" -v "y" -n -f sequences.txt -z
Supp(x & y) = 0.6 | 3 lines: 2, 3, 4
[BINDING: x = B; y = A]
Supp(B & A) = 0.4 | 2 lines: 2, 4
[BINDING: x = A; y = B]
Supp(A & B) = 0.4 | 2 lines: 2, 4
[BINDING: x = B; y = C]
Supp(B & C) = 0.2 | 1 lines: 3
[BINDING: x = A; y = C]
Supp(A & C) = 0.0 | 0 lines
[BINDING: x = C; y = B]
Supp(C & B) = 0.2 | 1 lines: 3
[BINDING: x = C; y = A]
Supp(C & A) = 0.0 | 0 lines
Note that the result first shows the combined metrics (union).
Finally, given a file of sequences, it is also possible to select its lines (0-indexes):
$ patterncounter select -f sequences.txt -n 4
0| A -1 -2
2| A B -1 -2
4| B -1 A B -1 A -1 C -1 -2
Installation#
You can install PatternCounter via pip from PyPI:
$ pip install patterncounter
Usage#
Please see the Command-line Reference for details.
Contributing#
Contributions are very welcome. To learn more, see the Contributor Guide.
License#
Distributed under the terms of the MIT license, PatternCounter is free and open source software.
Issues#
If you encounter any problems, please file an issue along with a detailed description.
Credits#
This project was generated from @cjolowicz’s Hypermodern Python Cookiecutter template.