I am trying to create a simple implementation of the FlajoletMartin algorithm using Python. The stream will be the contents of a text file and you will produce an approximation of the number of unique words in the file as given by the algorithm. You will need to process the file one line at a time and may not store any part of the file. You can obtain words by splitting the lines on whitespace. Your code will be run from a terminal according to the following command
The text file is:
HERE IS THE TEXT FILE:
this is a fun file
this is the second line of the file
this is the third line of the file
this is the fourth and final line of the file
for line in sys.stdin:words = line.split()for word in words:bin_string = bin(hash(word))print(bin_string)
I tried that. And I receive a syntax error. Can you show me?
The below code would "work".
strings = set()
for line in sys.stdin:
words = line.split()
for word in words:
print("\nNumber of distinct elements:", len(strings))
By "work" I mean the code runs and calculates the exact number of distinct elements in the input.
However, this code definitely does not implement the Flajolet-Martin algorithm.