CPFAQ: The Value of Canonical Questions

Canonical

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Last week I discussed how question merging works for Quora and CPFAQ. Related to question merging is the idea of canonical questions. Although I have written about canonical questions in the past, I haven’t explained why they’re critical for CPFAQ. That’s the topic for this week.

« Continue »

CPFAQ: Merging Questions

Merge

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

If a CPFAQ page has a canonical title and contains a list of Quora questions that all relate to the title, why not just merge all the Quora questions into one canonical Quora question? Good question.

« Continue »

CPFAQ: Canonical Question Statistics

Numbers

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

As I mentioned last week, I’m currently creating FAQ pages, and those FAQ pages rely on canonical question titles. This week I’ll discuss some observations about the set of titles I have so far.

« Continue »

CPFAQ: Creating a FAQ Page

Pages

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

For at least the next few weeks, I’ll be creating competitive programming FAQ pages for the most frequently-asked competitive programming questions, according to my analysis of Quora content. That set of pages will give me a foundation on which to add more specialized questions over time. This week, I’ll explain the page creation process that I’m currently using.

« Continue »

CPFAQ: Adding Wiki Pages

Pages

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

We’re officially halfway through the year, as measured by weekly blog posts. That means I’m also halfway through the CPFAQ project. As I mentioned last week, I’m building the Competitive Programming FAQ inside a MediaWiki site. This week, I added a few more pages to the wiki. My plan is first to focus on the questions, and later in the year to work on the answers. So the FAQ pages will initially just contain pointers to Quora questions (along with their answers), and will later include answer text in the wiki itself.

« Continue »

CPFAQ: CPWiki

MediaWiki

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

With the halfway point of 2018 approaching, it’s time to focus on the website that will host the content for the CPFAQ. I decided a few months ago that I would use MediaWiki software to host the FAQ. The advantage of a wiki is that it will allow me to write to write encyclopedia-style pages to supplement the main FAQ pages. This week, I have been thinking about how I want to organize the wiki, and I’ve created a few pages to get things started.

« Continue »

CPFAQ: Document Classification

MonkeyLearn

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

To organize my list of Quora questions, I have started giving each one a primary category that indicates what it is primarily about. For example, the primary category for How can I sharpen my mathematical skills in the context of competitive programming? is Mathematics (with Competitive Programming as the implicit overall topic for all questions).

On Quora, categories are known as topics, and they are assigned to questions by (1) Quora users, and (2) the Quora Topic Bot (QTB), an automated process. But there’s a lot of inaccuracy in topic assignments. For topics assigned by users, there are a few contributors to inaccuracy: First, most question askers don’t think much about correct topic assignment. They are just trying to get their question answered. Secondly, they often just spam the question with as many topics as possible because they think it will increase the probability of it being answered. For topics assigned by QTB, the main problem is that machine learning algorithms still aren’t perfect at assigning topics, and they can be misled by users’ topic assignment behavior.

Using a set of Quora questions that I categorized myself, I thought it would be interesting to see what kind of auto-categorization results I could get using some free text classifiers.

« Continue »

CPFAQ: Listening for New Questions

New Questions

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Quora doesn’t provide a page where a user can see every new question for the topics that they follow. Like other social media companies, Quora believes that the best way to present content to users is in the form of a “feed.” This feed is not just a reverse chronological list of new posts. Rather, it’s the output of an algorithm that considers multiple factors to determine what to show the user.

There’s an ongoing debate, which I won’t get into here, about the wisdom of allowing a secret algorithm to control what you see online. But regardless of the overall pros and cons of an algorithmic feed, there are definitely drawbacks to using it to maintain a canonical list of questions. This week, I’ll discuss an alternative process for Quora content.

« Continue »

CPFAQ: Most Viewed Writers, Part 2

Type ball

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Last week, I described using a Quora topic’s Most Viewed Writers page to collect questions with popular answers. I started with the Competitive Programming topic. This week, I’m moving on to most viewed writers in related topics.

Related Topics

Quora topics are organized in an ontology, a hierarchy of topics and subtopics. But topics can be created by anyone, and new topics don’t have to be added to the ontology. Furthermore, popular topics (like the Competitive Programming topic) are locked, which means only Topic Gnomes are allowed to add direct child topics (though anyone can add topics further down in the hierarchy, if they can find an appropriate unlocked parent).

By traversing the ontology and using other techniques to find topics outside of the ontology, I came up with a list of 138 topics that I have been using to collect questions and answers. This week, I updated my scraper to traverse that topic list to download pages of the form https://www.quora.com/topic/[topic-name]/writers (the Most Viewed Writers page). Out of the list of 138 topics, I found 30 that had /writers pages. Although Quora doesn’t disclose how it decides which topics get a Most Viewed Writers page, users have come up with various theories. At a minimum, a topic must have some reasonable level of activity (questions and views) for the MVW page to be meaningful.

Once I had the 30 pages, I used the XPath technique described last week to get links to the answers written by the most viewed writers. One difference from last week is that I only extracted links for the top 10 writers for each topic, since that’s all an anonymous user (like my automated tool) can see.

Results

Because competitive programming questions are often tagged with the top-level Competitive Programming topic, I got good question coverage just by looking at that topic. Last week, I found over 3000 questions with answers written by the 50 most viewed writers in the topic. However, more questions can be found by exploring related topics. I found about 1300 more by looking at the 10 most viewed writers in relevant topics with /writers pages. Here are some examples.

Although Dr. Thomas Cormen (co-author of the famous textbook Introduction to Algorithms) never participated in competitive programming himself, he appreciates the creative thinking that goes into solving programming puzzles. He doesn’t think competitive programming is for everyone though, so he doesn’t emphasize it in his introductory classes.

Adam D’Angelo is CEO of Quora, a former competitive programmer, and a top writer in related topics. He wrote popular answers to these questions.

On the subject of getting started at competitive programming, Nikhil Garg has several pieces of advice, including a suggestion to re-write your solution from scratch after you get it accepted the first time.

On the subject of time management, two answers to this question suggest keeping a problem in your head so you can work on it when you’re not in front of a computer.

Last week I linked to a popular question about which algorithms and data structures to learn for competitive programming. Here’s a similar question about mathematics topics.

(Image credit: John Bell)

CPFAQ: Most Viewed Writers

Writer

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

To find questions for the FAQ, I started with a set of relevant Quora topics, and collected the questions under those topics. That gave me a master list of questions. But because of question duplication and other Quora data quality problems, I needed ways to rank the questions in the list so I could focus my efforts on the best ones. One data point I use is follower count. This gives me the questions that the most people are interested in, whether or not they have good answers. This week I’m going to look at the answer upvotes metric, an indicator of answer quality. Since a FAQ has both questions and answers, it’s important to identify good answers as well as good questions.

« Continue »