CPFAQ: Most Viewed Writers

Writer

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

To find questions for the FAQ, I started with a set of relevant Quora topics, and collected the questions under those topics. That gave me a master list of questions. But because of question duplication and other Quora data quality problems, I needed ways to rank the questions in the list so I could focus my efforts on the best ones. One data point I use is follower count. This gives me the questions that the most people are interested in, whether or not they have good answers. This week I’m going to look at the answer upvotes metric, an indicator of answer quality. Since a FAQ has both questions and answers, it’s important to identify good answers as well as good questions.

« Continue »

CPFAQ: Codeforces in Wikipedia

Codeforces

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

In defining FAQ-related concepts like question categories, I always check Wikipedia to see what it has to say on a topic. For competitive programming, Wikipedia coverage can be uneven, and some articles start with the dreaded Wikipedia warnings about questionable notability or excessive reliance on primary sources. But Wikipedia is a quick way to verify that, for example, HackerRank is written in camel case, but Topcoder isn’t. (There’s no way to tell from the Topcoder logo).

As I was browsing competitive programming topics, I noticed that Codeforces (also not written in camel case) had no Wikipedia article. I thought that was strange, considering that Codeforces is one of the best-known online judges. So I decided to see what was up.

« Continue »

CPFAQ: Question Categories

Books

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

As I mentioned last week, I have been writing and categorizing canonical questions for the FAQ. In this week’s post, I’ll describe the categories I have so far.

I have written before about collecting, cleaning up, and classifying topics in the Quora topic ontology. Although I think I’ll be able to improve the QTO using my ongoing categorization work, there are a few differences between Quora topics and the categories described in this post:

  • Although Quora topics are arranged in a hierarchy, child topic names often include the parent topic name or other identifiers. For example, Algorithms in Competitive Programming is a child topic of Competitive Programming. They can’t just name it Algorithms because there’s already a topic with that name. For my purposes, I don’t have to worry about disambiguating my topic names, since they’re all implicitly related to the master topic of Competitive Programming.
  • Quora topics can be nested to (as far as I know) any depth. For my categories, I’m sticking with a fixed four-layer depth: the implicit master topic (competitive programming), the primary category (described below), the canonical question title, and the Quora question title.

« Continue »

CPFAQ: Canonical Questions

Stained glass window

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

In recent weeks, I’ve been using text mining techniques to analyze a set of Quora questions and look for patterns. This week, I’m taking a more manual approach to analyzing the question database.

« Continue »