Finding relevant books without sacrificing your privacy
Wed 25 Aug 2021
Web retailers such as Amazon.com are able to find just the right book for you. This is a great feature, but it comes at a cost: its recommendations work because it is storing information about you. The better it knows you, the better its recommendations.
At OAPEN, we do not track people. Instead, we used the full text of the open access books and chapters in our collection. In an experiment – based on over 10,000 titles – we took the complete text of a book, cut it up in blocks of three consecutive words (called trigrams) and filtered out all the common phrases. This leaves you with a small group of terms that are unique for that particular book. The next phase is finding other titles that share the same terms. The more terms they share, the more they are connected.
Using this algorithm helps to find books that are very similar: if you are interested in a certain book, you should also download these books as well. However, it can also find books that are a little less similar: you might use this to expand your research, or to create a collection of books. Surprisingly enough, this algorithm can also find translations; it even works across languages.
Finding related titles in this way does not have to be confined to the OAPEN Library. The same method can be applied to other collections of open access books or even open access journal articles.
More information can be found in this article:
Snijder, R. (2021). Words Algorithm Collection—Finding closely related open access books using text mining techniques. LIBER Quarterly: The Journal of the Association of European Research Libraries, 31(1). https://liberquarterly.eu/article/view/10938