I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.
One result of classifying many Quora questions is finding many duplicates. Quora knows about this and provides a Merge function. But as I have written about before, there’s also a content review bot that unmerges questions it thinks are not similar enough. I did some more investigation into this bot’s behavior, which I’ll describe this week.
The Merge Process
Here’s how my question merge process works:
- As I’m classifying new questions, I see one that looks familiar. So I look through my classified list for a similar question to use as a merge target.
- I save links to the merge target and merge candidate (the question I want to merge with it).
- I merge the two questions in the Quora UI.
- The next day, I check the merge candidate by clicking my saved link.
- If instead of getting the merge candidate I see the merge target with a message like “You were redirected because the question (question title) was merged with this question,” that’s a good sign. It means the merge process succeeded. And the questions will probably stay merged, since if a bot is going to revert a merge, it will usually do so within a few hours. (A human Quora user or administrator could still decide to unmerge them later).
- But more often, I’ll just see the merge candidate, which means the merge was reverted. In this case, the question log will say “(question) was was unmerged from this question by Quora Content Review.” QCR is the Quora bot that handles merging and unmerging, among other question edits.
- Now I have a few options: (a) I can leave the questions as-is; (b) I can Revert the QCR action, which re-merges the questions; (c) I can Report the QCR action, which seems to send the case to a human for evaluation.
- If I choose (b) or (c), I wait a day and return to step 4 to check the result. Then I repeat the process until the questions stay merged, or I give up.
In my experience, option (b) isn’t useful. If the QCR bot didn’t like the merge the first time, it probably won’t like it the second time around. Option (c) takes longer, but it’s more effective, since the bot won’t override a Quora employee’s decision.
Here’s the most egregious example I have seen of QCR merge revert activity:
- Merge target: What is the best strategy to improve my skills in competitive programming in C++ in 2-3 months? This is a famous/infamous question in the Competitive Programming topic, and it has 117 questions merged into it.
- Merge candidate: What is the best strategy to improve my skills in competitive programming in C++ in 4-5 months?
Notice how similar these questions are. The distinction between “2-3 months” and “4-5 months” is irrelevant in this context. The QCR bot has allowed merges between questions that are much more different than these two, so I’m not sure why it doesn’t like this one. My guess is that it has something to do with the unusual popularity of the target question. But I have merged other questions into that target question, so popularity can’t be the only reason.
An anonymous user created these three questions within a few minutes of each other:
- Is it a common practice and ethical in competitive coding? (yes, that’s the full title): 14 Jan 2018 11:59 PM
- Is copying code from online source an ethical and common practice in competitive coding during an online contest?: 15 Jan 2018 12:06 AM
- Is copying code from a online source without due credit to the web source an ethical and common practice in competitive coding (esp during a online contest)?: 15 Jan 2018 12:12 AM
Sometimes people create multiple similar questions to attract interest and answers. But considering the timing and wording of these questions, it seems more likely that the user was trying to edit the question title for a single question, and ended up creating three separate questions.
Due to the similarities in wording and creation time, it seems like the QCR bot would be happy to have these questions merged. And the merge did partially work. QCR allowed merging #1 with #2, but reverted merging #2 with #3.
Spending Time on Merging
Quora says it wants fewer duplicate questions, but the QCR bot makes the merge process so time consuming that it’s hard to see why people bother with it. Many Quora users already post questions without checking for duplicates. Then when people try to fix that problem, the QCR bot fights them. So not only do we have to find questions to merge, but we have to come back later to make sure they stay merged. On Stack Exchange, users with sufficient reputation can mark questions as duplicate, or delete them. It’s not surprising that the question quality is higher there.
(Image credit: Nancy)