Trending Papers is a project that aims to organize computer science research in a logical, simple, and easy-to-follow way.
It is designed to help you find what's worth reading first.
I started building Trending Papers because of the following reasons:
The data comes from Arxiv, a curated research-sharing platform with over 2 million articles open to anyone. Approximately 1/4 of them are in the field of computer science. For more information about Arxiv, please access arxiv.org.
The system runs an update process every day, comparing the current state of the local dataset with Arxiv's dataset (the source of truth).
The last update was completed on Thursday, September 26, 2024 - 08:51 UTC.
The PageRank number is the ranking order of each paper in the dataset. The lower the number, the more important a paper is (1 is the highest rank), according to an adapted version of the original PageRank algorithm developed by Google founders Larry Page and Sergey Brin in 1997.
PageRank algorithm is a formula Google Search uses to rank web pages in their search engine results. It was developed by Google founders Larry Page and Sergey Brin in 1997 to evaluate the quality and quantity of links to a page. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. The algorithm assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The numerical weight that it assigns to any given element is referred to as the PageRank of that element. The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. For more information about PageRank, please check Google's founders original paper.
Trending Papers considers the links between papers instead of web pages. It continuously parses the papers’ references, updating the graph of interconnected papers in the dataset. The system then uses this graph to compute the weight of each element in the graph. This probability represents the likelihood that a person randomly following paper references will arrive at any particular paper.
There is an important adaptation done to the original paper. For uncited papers, the system estimates this probability based on features such as the paper's age, who the authors are, the order on which the authors appear in the paper, etc.
The system shows the PageRank ranking order, not the PageRank weights (the probabilities), as the weights are very small for most papers.
The Growth number is the percentage growth in PageRank's weight in the past 12 months. For example, a paper that has its PageRank weight as 0.001, and had its weight as 0.0005 12 months ago, has a Growth number of +100%. This means that this paper doubled its importance in the graph during the last year.
For papers released in less than 1 year and more than 3 months, the Growth shows the percentage growth in PageRank's weight from the day the paper received its first citation to today. For papers released in less than 3 months, the system does not compute the Growth, as the numbers are too unstable.
Naturally, the Growth is only computed for papers that have being cited at least once.
The Citations number is the number of other papers in the dataset that cite that particular paper.
Important: The system uses a number of natural language processing techniques to extract, parse, and identify references from each paper in the dataset, which might cause some references to be not recognized. Thus, the number shown might be a bit lower than the real number.
The number of citations in the last 3 months (”in 3 mos.” short) is the number of other papers in the dataset that cite that particular paper within the last 3 months. This metric might help us uncover fast trending papers in the dataset.
The PageRank evolution chart is a visual representation of the PageRank weight evolution of a paper over the past 1 year. As this is an expensive computation, the charts are updated once a week.
While on the main page, you can filter by category, release date, and whether the papers have any citations. On the search results page, you can filter by release date and whether papers have any citations.
The papers shown are the ones that belong to the combined subset of the dataset once all the filters are applied. For example, if you apply a category filter of "Machine Learning", a released on filter of "past 6 months", and a show filter of "only cited papers", you will see only Machine Learning papers released within the past 6 months and that are cited by at least 1 other paper in the dataset.
When authors release papers on Arxiv, they choose one or more categories for the papers, such as Computer Vision and Pattern Recognition, Machine Learning, Human-Computer Interaction, or Artificial Intelligence. There are over 150 categories in the taxonomy. When you select a category in the filter dropdown, you filter all papers in the dataset classified within that category by their authors.
In the dropdown, the system lists all primary categories from all papers in the dataset. The number beside every category is the number of papers with that category as its primary category.
When you filter papers using the setting "Released on past 6 months", you will see only papers that have its first release date between today and 6 months ago. The same is valid for all options in this filter.
There are 2 special options in this filter: "since beginning" and "past 24 hours". When you select "since beginning", you will see all papers in the dataset. When you select "past 24 hours", you will see only the most recent papers.
When you select "show only cited papers", you will see only papers that have being cited at least for 1 other paper in the dataset. Conversely, if you apply "only unicted papers", you will see only papers that have not being cited by any other paper in the dataset.
One of the main benefits of Trending Papers is the ability to sort papers by meaningful metrics. You can sort papers:
To do that, just click on the column header.
When you search for a query, the system computes i) the cosine similarity and ii) the cover density ranking between the query and all the papers in the dataset. Then, the system combines these 2 features and more features, such as PageRank, to output a Relevance metric for all papers in the dataset, and shows the most relevant papers to the given query.
The Relevance number is a metric that infers how relevant a paper is to a given search query. It is based on the computation of the cosine similarity and the cover density ranking between the query and that particular paper. Once those numbers are weighted, the final relevance is computed by bumping the weighted average of cosine similarity and cover density ranking by a factor that is mainly a function of PageRank weight (that's why sometimes the Relevance is a percentage above 100%).
For now, this formula is fixed. However, once we gather enough click data, we intend to train a Machine Learning model to rank the results based on these features and click data.
The date shown on each paper is the first release date when the authors uploaded the paper on Arxiv. Usually, the authors update the papers and re-upload their work; the date displayed here will be the same, the first release date, not the last date when their authors might have updated the paper.
The system uses an LLM to summarize the papers abstracts in 1-2 sentences, which are displayed by each paper.
Creating an account is free and always will be: just click on the button in the upper right corner of the screen and follow the prompts. The log-in and log-out buttons are on the same screen region.
The system is being built to not only show you relevant papers in general but also relevant papers based on what you find interesting. We will show personalized reading lists based on your taste, so you can discover what is worth reading first: we believe personalization is key. That's why the system has create account & login, so that it can learn your taste based on what you click.
To star or unstar a paper, just click on the star beside the paper title.
When you star a paper, you let the system know which kind of paper you like. The system uses this information to recommend reading lists that you might find interesting but might have not seen yet.
The "For you" section is still work in progress, and will be released over the next weeks.
The section will show you personalized reading lists based on your taste. The system infers your taste based on your activity on the site: in what you click, which filters you use, which papers you star, what you search, etc.
If your question could not be answered above, please get in touch with us via email: contact@trendingpapers.com