CPFAQ: Canonical Question Statistics, Part 2

Question Stats

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

I have now classified over 1000 Quora questions, using 552 canonical titles, and I think that’s a good spot to move on to some other CPFAQ tasks. But first, here’s how the numbers look compared with my previous checkpoint a few weeks ago.

Statistics

Not surprisingly, the question distribution has changed a bit. The top 25% of classified questions now makes up 12 unique titles instead of 10. The top 3 canonical question titles are the same, but the rest have moved around, including one that got bumped out of the top 10. Here are the questions that have moved into the top 10 (and into the FAQ):

As for the next 25%, there are now titles in that group that have only two Quora questions each. Last time, these questions were all in the bottom 50%, so that means questions have bunched up a bit towards the left side of the distribution. However, the long tail of questions is still there, with many canonical titles only associated with one Quora question.

Categorization Tools

I’ve been using Excel to record a primary category and canonical title for each Quora question in my list. But with over 15,000 more questions to categorize, a more specialized tool would be more efficient. I have in mind a simple tool that reads the current list of Quora question titles and URLs, along with the primary category and canonical title for categorized questions, and presents a UI with these elements:

  • Question title, linked to its Quora page.
  • A radio button for each primary category, with the current category selected. This provides a one-click way to categorize the question.
  • A list of canonical titles, with the most popular titles at the top of the list, and a textbox to add a new title. If an existing canonical title applies, it takes one click to select it. Otherwise, I can type a new one.
  • Next/Previous buttons to navigate through the list of questions.

This tool will make it easier to navigate through the question list and quickly assign primary categories and canonical titles. For my next categorization push, I think I’ll skip the long tail questions and focus on the more popular ones. For the FAQ, pages that summarize multiple Quora questions are the most useful. If a canonical title only applies to one or two Quora questions, that title probably doesn’t describe a frequently asked question.

(Image credit: russellstreet)