a MUggle's Text Understanding eXpedition

Jul 19, 2017
Ukkonen's Suffix Tree Algorithm Implemented in Python

I implemented Suffix Tree algorithm in Python in the past few days, you can refer to my github repository if you’re interested in. The famous tutorial on stackoverflow is a good start. If you pay enough attention to some details like state updating or suffix links managing, you can write the code by yourself for sure. When a lot of small operations are entangling together, the ancient trick flowchart is always a wise choice for assisting. Also a tree visualizer might help a lot with debugging for sure. In order to make the intervene of the processes easy to understand, we demonstrate the flowchart in this post.
Jun 19, 2017
TwEater: A Python bot for scraping more tweets and comments

Today, I submitted my way of scraping tweets on Twitter - TwEater on GitHub.
May 7, 2017
String algorithm: Suffix Tries

Trie

A trie is a tree representing a collection of strings with one node per common prefix. It is a natural tool to manage either a set or a map where keys are strings. In a trie tree:
Apr 26, 2017
String Algorithm: Bloom Filter

Just like hash table with multiple \((k)\) hash functions. The searching time complexity of bloom filter is \(O(k)\), just as that of hash table \(O(1)\). For hash table, in order to keep the the false positive rate sufficiently low, the array must be very large to make sure that only a small fraction of bits are set to 1. Whereas for Bloom filter, the fraction could be relatively big (~50%) if the number \(k\) (the number of hash functions) and \(m\) (the size of bit array) are properly set.
Apr 17, 2017
String Algorithms: KMP in python

KMP is fantastic. There are some practical ideology in its design.

« 3 / 4 »

Ukkonen's Suffix Tree Algorithm Implemented in Python

TwEater: A Python bot for scraping more tweets and comments

String algorithm: Suffix Tries

Trie

String Algorithm: Bloom Filter

String Algorithms: KMP in python