Hello, OnlineGDB Q&A section lets you put your programming query to fellow community users. Asking a solution for whole assignment is strictly not allowed. You may ask for help where you are stuck. Try to add as much information as possible so that fellow users can know about your problem statement easily.

please help me. I am stuck

+8 votes
asked Apr 25, 2022 by Paul Hill (290 points)

I am trying to create a simple implementation of the FlajoletMartin algorithm using Python. The stream will be the contents of a text file and you will produce an approximation of the number of unique words in the file as given by the algorithm. You will need to process the file one line at a time and may not store any part of the file. You can obtain words by splitting the lines on whitespace. Your code will be run from a terminal according to the following command

The text file is:

HERE IS THE TEXT FILE:

this is a fun file
this is the second line of the file
this is the third line of the file
this is the fourth and final line of the file

import sys

for line in sys.stdin:
words = line.split()
for word in words:
bin_string = bin(hash(word))
print(bin_string)

2 Answers

0 votes
answered Apr 25, 2022 by Eveheeero (140 points)
use set for get unique values, it uses hash algorithm
commented Apr 25, 2022 by Paul Hill (290 points)
I tried that.  And I receive a syntax error.  Can you show me?
+1 vote
answered Apr 26, 2022 by Peter Minarik (84,720 points)

I tried that.  And I receive a syntax error.  Can you show me?

The below code would "work".

import sys

strings = set()

for line in sys.stdin:
    words = line.split()
    for word in words:
        strings.add(word)

print("\nNumber of distinct elements:", len(strings))

By "work" I mean the code runs and calculates the exact number of distinct elements in the input.

However, this code definitely does not implement the Flajolet-Martin algorithm.

commented Apr 26, 2022 by Paul Hill (290 points)
Does this run in the terminal according to the following command
commented Apr 26, 2022 by Peter Minarik (84,720 points)
edited Apr 26, 2022 by Peter Minarik
It runs in OnlineGDB just fine. Make sure on the bottom of the page you select INPUT --> TEXT (not Interactive Console) so you can paste your file's content there.

It also should run just fine on any computer if you redirect your files input into your executable.

e.g.:

./WordCount.py < MyInput.txt

I hope this helps.
commented Apr 26, 2022 by Paul Hill (290 points)
Is the any way you can try to do the Flajolet algorithm with it.    I can do it with a list of number just fine.

stream=[1,2,3,4,5,6,4,2,5,9,1,6,3,7,1,2,2,4,2,1]


maxnum=0
for i in range(0,len(stream)):
    val= bin((1*stream[i] + 6) % 32)[2:]
    
    sum=0
    for j in range(len(val)-1,0,-1):
        
        if val[j]=='0':
            sum+=1
        else:
            break
    if sum>maxnum:
        maxnum=sum
        
print('distict elements', 2**maxnum)

is there any way to modify both codes so it is more like the algorithm
commented Apr 28, 2022 by Peter Minarik (84,720 points)
Isn't it as simple as you tried in your original code using the hash() function? Get the hash of a word and use that, instead of stream[i].
Welcome to OnlineGDB Q&A, where you can ask questions related to programming and OnlineGDB IDE and and receive answers from other members of the community.
...