I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.
The Competitive Programming topic on Quora, and related topics, contain thousands of examples of what people want to know about that subject. So it’s the definitive source for deciding what qualifies as a frequently asked question for CPFAQ. But many of these questions are duplicates, which makes it difficult to find the best answers to a question. As I mentioned last week, I have a process for merging some of these duplicates, but Quora automation often works against the process, despite Quora’s stated opposition to duplicate questions. This week, I worked on a basic tool to help me keep track of merges.
Quora lets any user merge questions, but the Quora Content Review (QCR) bot reserves the right to unmerge them. Users can then report bot activity for review by a human Quora employee. So for Quora users trying to clean up questions, the merge process goes like this:
- Merge two or more questions.
- Check a merged question.
- If it’s still merged, declare victory.
- Otherwise, report the unmerge for human review.
If the bot wants to revert a merge, it usually acts within a few hours after the merge. Human intervention can take days. I started tracking merges in a spreadsheet, so I could go back and check on them. But as the list gets longer, it gets tedious to go through each link. Some automation is in order.
Each merge involves two or more questions:
- A canonical title and link, for the merge target
- A list of candidate links to merge into the target.
I wrote a command-line tool to carry out these steps:
Read questions from a text file
As I find duplicate questions in Quora, I record them into a text file. The file is divided into sections that each include a canonical question and a list of merge candidates. For example:
How do I get started with competitive programming? https://www.quora.com/What-do-I-need-to-know-to-start-competitive-programming https://www.quora.com/How-do-you-become-capable-of-competitive-programming https://www.quora.com/How-should-one-start-preparing-for-competitive-coding https://www.quora.com/How-can-I-start-preparing-for-competitive-coding-challenge https://www.quora.com/How-should-I-prepare-myself-for-competitive-coding
At the top of the section is the canonical question title and a link to the question to use as a merge target. Notice that the canonical title (How do I get started with competitive programming?) and the Quora title (What do I need to know to start competitive programming) don’t have to match. As I explained in The Value of Canonical Questions, I use unique canonical question titles for pages in the FAQ. Here’s the FAQ page for this example: How do I get started with competitive programming?
When it starts up, the tool reads the contents of this text file into a list of
So far, I have described a text file that I update manually based on questions I find and merge in Quora. But people and bots can merge and unmerge questions any time, which could lead to the text file getting out of date. What I want is a way to keep the text file up to date with Quora. To do this, I can use
The goal is for the tool to open each link and verify an expected result. For the canonical link, the expected result is no redirection. If someone merges the canonical question into another question, then it no longer qualifies as a canonical question. In that case, I need to decide whether to unmerge it, or leave it as-is. So the tool first verifies that the canonical link does not redirect.
Next, the tool checks each merge candidate link and verifies that it does redirect. There are a few ways to verify a redirect using
WebRequest, but for Quora links, one simple way is to check if the query string contains a parameter called
redirected_qid. Quora adds this parameter when it redirects from one question to another.
Here’s a basic pattern for using
WebRequest to verify redirection:
var request = (HttpWebRequest)WebRequest.Create(urlAddress); var response = (HttpWebResponse)request.GetResponse(); var isMerged = response.ResponseUri.Query.Contains("redirected_qid");
urlAddress contains a question URL, then
isMerged will be true if the question has been merged, and false if it hasn’t.
Write an output text file
As it processes each question, the tool shows whether the redirect result is unexpected. For example, if a canonical question is redirected or a merged question is not redirected, manual intervention is required. These unexpected results are flagged in an output text file. This allows me to focus on the questions that need attention, without having to manually open each one.
(Image credit: torbakhopper)