PatternCounter#

PyPI Status Python Version License

Read the documentation at https://patterncounter.readthedocs.io/ Tests Codecov

pre-commit Black

Features#

This tool allows to count patterns in lists of sequential groups using rules and variables.

For the following examples, consider the following file (sequences.txt):

A -1 -2
B -1 -2
A B -1 -2
A -1 B C -1 -2
B -1 A B -1 A -1 C -1 -2

Example 1: Count how many sequences contain both the elements A and B:

$ patterncounter count "A B" -n -f sequences.txt
Supp((A B)) = 0.6 | 3 lines: 2, 3, 4

Example 2: Count how many sequences contain elements A and B at the same group:

$ patterncounter count "A & B" -n -f sequences.txt
Supp(A & B) = 0.4 | 2 lines: 2, 4

Example 3: Count how many sequences have an element B that after after A:

$ patterncounter count "A -> B" -n -f sequences.txt
Supp(A -> B) = 0.2 | 1 lines: 3

Example 4: Count in how many sequences the element B was removed within an interval of A:

$ patterncounter count "[A OutB]" -n -f sequences.txt
Supp([A OutB]) = 0.2 | 1 lines: 4

Example 5: Count in how many sequences there is an element C that occurs after an interval of A:

$ patterncounter count "[A] -> C" -n -f sequences.txt
Supp([A] -> C) = 0.4 | 2 lines: 3, 4

Example 6: Show results even when the pattern does not exist:

$ patterncounter count "Z" -n -f sequences.txt -z
Supp(Z) = 0.0 | 0 lines

In addition to using simple rules, it is possible to define multiple rules and calculated association rules metrics among them:

Example 7: Count both how many sequences have an interval of A, and how many sequences have an interval of A with an element B inside:

$ patterncounter count "[A]" "[A B]" -n -f sequences.txt
Supp([A], [A B]) = 0.4 | 2 lines: 2, 4
Association Rule: [A] ==> [A B]
  Supp([A]) = 0.8 | 4 lines: 0, 2, 3, 4
  Supp([A B]) = 0.4 | 2 lines: 2, 4
  Conf = 0.5
  Lift = 1.25
Association Rule: [A B] ==> [A]
  Supp([A B]) = 0.4 | 2 lines: 2, 4
  Supp([A]) = 0.8 | 4 lines: 0, 2, 3, 4
  Conf = 1.0
  Lift = 1.25

It is also possible to define variables.

Example 8: Count how many sequences have groups with two distinct elements:

$ patterncounter count "x & y" -v "x" -v "y" -n -f sequences.txt -z
Supp(x & y) = 0.6 | 3 lines: 2, 3, 4

[BINDING: x = B; y = A]
  Supp(B & A) = 0.4 | 2 lines: 2, 4

[BINDING: x = A; y = B]
  Supp(A & B) = 0.4 | 2 lines: 2, 4

[BINDING: x = B; y = C]
  Supp(B & C) = 0.2 | 1 lines: 3

[BINDING: x = A; y = C]
  Supp(A & C) = 0.0 | 0 lines

[BINDING: x = C; y = B]
  Supp(C & B) = 0.2 | 1 lines: 3

[BINDING: x = C; y = A]
  Supp(C & A) = 0.0 | 0 lines

Note that the result first shows the combined metrics (union).

Finally, given a file of sequences, it is also possible to select its lines (0-indexes):

$ patterncounter select -f sequences.txt -n 4
0| A -1 -2
2| A B -1 -2
4| B -1 A B -1 A -1 C -1 -2

Installation#

You can install PatternCounter via pip from PyPI:

$ pip install patterncounter

Usage#

Please see the Command-line Reference for details.

Contributing#

Contributions are very welcome. To learn more, see the Contributor Guide.

License#

Distributed under the terms of the MIT license, PatternCounter is free and open source software.

Issues#

If you encounter any problems, please file an issue along with a detailed description.

Credits#

This project was generated from @cjolowicz’s Hypermodern Python Cookiecutter template.