I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.
To find questions for the FAQ, I started with a set of relevant Quora topics, and collected the questions under those topics. That gave me a master list of questions. But because of question duplication and other Quora data quality problems, I needed ways to rank the questions in the list so I could focus my efforts on the best ones. One data point I use is follower count. This gives me the questions that the most people are interested in, whether or not they have good answers. This week I’m going to look at the answer upvotes metric, an indicator of answer quality. Since a FAQ has both questions and answers, it’s important to identify good answers as well as good questions.
Most Viewed Writers
Major Quora topics (those with enough activity) have a page listing the top 50 writers for that topic, based on how many answer views each writer received in the past 30 days. Here is that page for competitive programming: Most Viewed Writers in Competitive Programming. For each writer, the page provides a link to the answers that writer has submitted for the topic. As of this moment, Bohdan Pryshchenko is the #1 most viewed writer in the Competitive Programming topic, with 128,640 views and 355 answers. (His view count is higher by a wide margin than the #2 writer, but since the ratings only cover the past 30 days, it’s not too hard for people to move around on the list).
Each person’s answers link leads to a page with their answers (and the associated questions) for the topic. If you’re logged in to Quora while viewing that page, each answer has an Upvote and a Downvote button, and the Upvote button shows the number of upvotes that the answer has. This week, I experimented with using that number to rank questions.
To collect the data, I used my standard technique: save the pages as HTML, and use XPath to extract the desired elements. From previous collection exercises, I already had the XPath to get the question title and link. So I just had to add XPath for the upvote count. Writing XPath for Quora pages is mainly a matter of following nested div
s and span
s to get to the desired value:
.//div[contains(@class, \'AnswerListItem\')]
//div[contains(@class, \'hidden\')]
//div[contains(@class, \'icon_action_bar-label\')]
//span[contains(@class, \'icon_action_bar-count\')]/span[2]
Upvote and Follower Counts
From the 50 writers on the Most Viewed Writers page for Competitive Programming, I got about 3000 unique questions. For each question, I retrieved the number of upvotes for particular answers, using the XPath shown above. From earlier work, I had the follower counts for many of the same questions. As a simple heuristic, I averaged the two values to get a score that combines interest in the question with appreciation for the answer, with the two metrics weighted equally.
Results
Here are a few interesting results from question/answer pairs that are high on the followers + upvotes ranking.
High on the list are three questions about one person: competitive programmer Anudeep Nekkanti. The answers (written by Anudeep Nekkanti himself) provide competitive programming advice from his perspective:
- What was Anudeep Nekkanti’s Competitive Programming strategy to become 35th in Global ranking, in just 6-7 months?
- How did Anudeep Nekkanti become so good at competitive programming?
- How come Anudeep Nekkanti (a great competitive programmer from india) is at ANITS (an unknown college)?
This question has an answer from Adam D’Angelo, former competitive programmer and current CEO of Quora:
Here’s a long list of programming interview questions:
Here’s a long list of recommended algorithms and data structures, and how to learn and practice them:
And a question list like this wouldn’t be complete without one of these questions, which people never tire of asking:
More than follower count, the number of upvotes metric is a good way to find useful content, and separate the best answers from the rest.
(Image credit: Andrew Frier)