![Clustering And Crawling Of News Articles](https://jonasbecker.net/wp-content/uploads/2023/04/crawl-cluster.png)
Description
An open-source project for crawling news articles from different websites with the help of the CommonCrawl project. Also includes a multi-level clustering algorithm using K-Means and the Latent Dirichlet Allocation to sort ~286,000 articles into their relevant topics.
The project includes the code published on GitHub and two datasets (unclustered & clustered).