## CPFAQ: Most Viewed Writers, Part 2

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Last week, I described using a Quora topic’s Most Viewed Writers page to collect questions with popular answers. I started with the Competitive Programming topic. This week, I’m moving on to most viewed writers in related topics.

## Related Topics

Quora topics are organized in an ontology, a hierarchy of topics and subtopics. But topics can be created by anyone, and new topics don’t have to be added to the ontology. Furthermore, popular topics (like the Competitive Programming topic) are locked, which means only Topic Gnomes are allowed to add direct child topics (though anyone can add topics further down in the hierarchy, if they can find an appropriate unlocked parent).

By traversing the ontology and using other techniques to find topics outside of the ontology, I came up with a list of 138 topics that I have been using to collect questions and answers. This week, I updated my scraper to traverse that topic list to download pages of the form https://www.quora.com/topic/[topic-name]/writers (the Most Viewed Writers page). Out of the list of 138 topics, I found 30 that had /writers pages. Although Quora doesn’t disclose how it decides which topics get a Most Viewed Writers page, users have come up with various theories. At a minimum, a topic must have some reasonable level of activity (questions and views) for the MVW page to be meaningful.

Once I had the 30 pages, I used the XPath technique described last week to get links to the answers written by the most viewed writers. One difference from last week is that I only extracted links for the top 10 writers for each topic, since that’s all an anonymous user (like my automated tool) can see.

## Results

Because competitive programming questions are often tagged with the top-level Competitive Programming topic, I got good question coverage just by looking at that topic. Last week, I found over 3000 questions with answers written by the 50 most viewed writers in the topic. However, more questions can be found by exploring related topics. I found about 1300 more by looking at the 10 most viewed writers in relevant topics with /writers pages. Here are some examples.

Although Dr. Thomas Cormen (co-author of the famous textbook Introduction to Algorithms) never participated in competitive programming himself, he appreciates the creative thinking that goes into solving programming puzzles. He doesn’t think competitive programming is for everyone though, so he doesn’t emphasize it in his introductory classes.

Adam D’Angelo is CEO of Quora, a former competitive programmer, and a top writer in related topics. He wrote popular answers to these questions.

On the subject of getting started at competitive programming, Nikhil Garg has several pieces of advice, including a suggestion to re-write your solution from scratch after you get it accepted the first time.

On the subject of time management, two answers to this question suggest keeping a problem in your head so you can work on it when you’re not in front of a computer.

Last week I linked to a popular question about which algorithms and data structures to learn for competitive programming. Here’s a similar question about mathematics topics.

(Image credit: John Bell)

## CPFAQ: Most Viewed Writers

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

To find questions for the FAQ, I started with a set of relevant Quora topics, and collected the questions under those topics. That gave me a master list of questions. But because of question duplication and other Quora data quality problems, I needed ways to rank the questions in the list so I could focus my efforts on the best ones. One data point I use is follower count. This gives me the questions that the most people are interested in, whether or not they have good answers. This week I’m going to look at the answer upvotes metric, an indicator of answer quality. Since a FAQ has both questions and answers, it’s important to identify good answers as well as good questions.

« Continue »

## CPFAQ: Codeforces in Wikipedia

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

In defining FAQ-related concepts like question categories, I always check Wikipedia to see what it has to say on a topic. For competitive programming, Wikipedia coverage can be uneven, and some articles start with the dreaded Wikipedia warnings about questionable notability or excessive reliance on primary sources. But Wikipedia is a quick way to verify that, for example, HackerRank is written in camel case, but Topcoder isn’t. (There’s no way to tell from the Topcoder logo).

As I was browsing competitive programming topics, I noticed that Codeforces (also not written in camel case) had no Wikipedia article. I thought that was strange, considering that Codeforces is one of the best-known online judges. So I decided to see what was up.

« Continue »

## CPFAQ: Question Categories

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

As I mentioned last week, I have been writing and categorizing canonical questions for the FAQ. In this week’s post, I’ll describe the categories I have so far.

I have written before about collecting, cleaning up, and classifying topics in the Quora topic ontology. Although I think I’ll be able to improve the QTO using my ongoing categorization work, there are a few differences between Quora topics and the categories described in this post:

• Although Quora topics are arranged in a hierarchy, child topic names often include the parent topic name or other identifiers. For example, Algorithms in Competitive Programming is a child topic of Competitive Programming. They can’t just name it Algorithms because there’s already a topic with that name. For my purposes, I don’t have to worry about disambiguating my topic names, since they’re all implicitly related to the master topic of Competitive Programming.
• Quora topics can be nested to (as far as I know) any depth. For my categories, I’m sticking with a fixed four-layer depth: the implicit master topic (competitive programming), the primary category (described below), the canonical question title, and the Quora question title.

« Continue »

## CPFAQ: Canonical Questions

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

In recent weeks, I’ve been using text mining techniques to analyze a set of Quora questions and look for patterns. This week, I’m taking a more manual approach to analyzing the question database.

« Continue »