Building trust in media
Mark asked me to put together a more complete collection of Wikipedia example articles.
These are a few of the articles I’ve marked up, with the goal of automating the process of converting existing quotes to Contextual Citations:
** These articles have been completely marked up, including Books that are not available online.
Here’s how I imagine CiteIt.net’s Contextual Citations could be integrated with Wikipedia:
In phase 1, citations from sample articles could be manually marked up. This trial-run would most likely start with less high profile pages. If this were successful, Wikipedia Editors could be given the ability to mark up citations through the editor and citations could be indexed, either manually or automatically upon publication. Since most people would not know about the ability to create contextual citations using the Wikipedia editor, this phase would also be fairly low-profile.
Improving the accuracy of the returned CiteIt Context is a precondition for greater adoption.
My friend Bryan said that it should be possible to retroactively go back and automatically convert all of Wikipedia quotes to use CiteIt’s Contextual Citations, assuming the Pareto principle that roughly 80% of the citations could be converted with a script following some straight-forward rules and the remaining citations would have to be manually processed.
With that in mind, I set out over the past month to do a fairly thorough analysis of the types of issues that we could encounter if we chose to automate the conversion. As part of the process, I would hope to create a database of citations to be “upgraded” and an interface for reporting and fixing bugs.
From this database, we could both:
If someone wanted to automate this process further, this process might be a good candidate for machine learning but the Wikipedia community’s philosophy towards automation and error-handling would determine how the technology is developed.
I did a more thorough job of analyzing the Hillary Clinton and Donald Trump articles, going through every quotation in the article (excluding the references at the end of the article) and marking each quote up with a q-tag and CSS classes, indicating the reason why the citation couldn’t be properly matched.
You can see highlights of all the errors if you click on the blue “Show Citation Errors” button in the upper-right corner of the yellow header:
Below is a list of the CSS error classes created when marking up sample Wikipedia articles.
Category | Class | Description | Example | Color |
---|---|---|---|---|
Error | citeit-automation-error | An automated bot would likely pull an inappropriate match should we return multiple matches and return all results to editors or readers? |
Trump favors neutral or positive balances of trade over negative balances of trade, also known as a “ “Share of total U.S. merchandise trade deficit by country” Explanation: Source Returns the first result from the first footnote’s source. But this is a legend title rather than the actual article body. It would be preferable to use a latter match. Perhaps a UI could be built that displays all matched results and gives Wikipedia editors the ability to chose the preferred instance. |
red |
Error | citeit-error-context | Context is Returned, but Incorrect | Example: convention’s “veiled” racist messages |
#ff3399 |
Error | citeit-error-quote-returned | Error in Returned Quote |
Clinton asserted President Trump’s 2018 budget proposal was “a con” for underfunding domestic programsReturned: Clinton called Mr. Trump’s 2018 budget proposal ” a cong “ which she said would underfund public education |
red |
Error | citeit-error-quote-context-edges | Slight character errors in Surrounding Context |
“adding that her mother Dorothy “made sure I learned [these] words from our Methodist faith” Quote: “And she made sure I learned [these] ds from our Methodist faith” |
red |
Error | citeit-error-unknown | Error of Unknown Type |
Emoluments Clause as HTML: Emoluments Clause as <q cite=”https://www.usatoday.com/story/news/politics/2019/10/21/donald-trump-mocks-constitution-emoluments-clause-phony/4055162002/” class=”citeit-error-unknown”>“phony”</q> |
red |
Error | citeit-error-404 | Source URL returns 404 error |
Clinton called for a constitutional amendment to limit “unaccountable money“ |
red |
citeit-pdf-scanned | Source Document is a PDF image that needs to be scanned with OCR |
memorandum saying “ |
#0099ff | |
Match | citeit-footnote-interspersed | There is a footnote insterspersed in the middle of the quote that could throw off the match |
Correct and consistent use of latex condoms can reduce the risk of syphilis only when the infected area or site of potential exposure is protected.[41] However, a syphilis sore outside of the area |
red |
Match | citeit-footnote-shortname | Footnote is a short form of citation. Need to cross-reference the short name with the longer reference in the references cited section. |
The footnote is listed in the short form: Lastname, page Troy 2006, pp. 176–77 References cited: Troy, Gil (2006). Hillary Rodham Clinton: Polarizing First Lady. Lawrence, Kansas: University Press of Kansas. ISBN 978-0-7006-1488-2. |
red |
Wiki | citeit-footnote-later | The match is not found in the first footnote, but in a second footnote. |
“In July 2016, she “committed” to introducing a U.S. constitutional amendment” 1st Source: not found 1st Source URL: http://www.cnn.com/2016/07/16/politics/hillary-clinton-campaign-finance/2nd Source: Hillary Clinton committed Saturday 2nd Source URL: https://www.politico.com/story/2016/07/hillary-clinton-citizens-united-225658 |
#cc6600 |
YouTube | youtube-video | The source is a YouTube URL | Example: “The Hillary Shimmy Song”. September 28, 2016. Retrieved September 16, 2017 – via YouTube. |
#3366ff |
Wiki | wiki-legend | The Quote is found in the Legend of Wikipedia Image and the footnote may be before the quote |
“Fact-checkers from The Washington Post,[839] the Toronto Star,[840] and CNN[841] compiled data on “false or misleading claims” (orange background), and “false claims” (violet foreground), respectively.” |
#3366ff |
Wiki | wiki-note | Internal Wikipedia Note |
Clinton into “imaginary discussions” with the also-politically active Eleanor Roosevelt.[f]
f. The Eleanor Roosevelt “discussions” were first reported in 1996 by The Washington Post writer Bob Woodward; they had begun from the start of Hillary Clinton’s time as first lady.[154] |
#3366ff |
Wiki | wiki-multiple-source | Wikipedia Source Citation Record Contains Multiple Sources |
Calabresi, Massimo (November 7, 2011). “Hillary Clinton and the Rise of Smart Power”. Time. pp. 26–31. See also “TIME magazine editor explains Hillary Clinton’s ‘smart power'”. CNN. October 28, 2011.Wikipedia article: Hillary Clinton |
#3366ff |
Match | citeit-numbers-written-out | The match is not found because Quoted text writes number out as text rather than numbers |
to a willingness “to remold society by redefining what it means to be a human being in the twentieth century, moving into a new millennium Source: “Let us be willing,” she urged in conclusion, “to remold society by redefining what it means to be a human being in the 20th century, moving into a new millennium.” |
#3366ff |
Match | citeit-text-from-source | Quote text needed to be replaced from the source |
“can’t .. miss” Wikipedia: (Outdated): “can’t miss” “can’t afford to miss” |
#3366ff |
Match, Feature | citeit-later-match | A later match (2nd or 3rd) would be preferable |
“genocidal taunts” Source: Quote found in the title, but better context is found in 2nd match in the article body. |
#cc6600 |
Match, Feature | citeit-feature-added-word | A word is added to the quote using brackets: |
“[although] “we did not find clear evidence that Secretary Clinton or her colleagues intended to violate laws “ |
#cc6600 |
Match | citeit-formatting-mismatch | Looks like it matches, but doesn’t because of a formatting mismatch. |
“contends that they are not shapes of constellations but of what might be called <i>counter constellations</i>, the irregular-shaped dark patches within the twinkling expanse of the <a href=”https://en.wikipedia.org/wiki/Milky_Way” title=”Milky Way“>Milky Way</a>” |
#ff99ff |
Match | citeit-change-case | Matches except for changing from upper to lower case or vice versa |
[T]he mudslides and heavy rains did not appear to have caused any significant damage to the Nazca Lines |
#ff99ff |
Match | citeit-hyphen-change | Matches except for changing hyphenization | Nazca Lines (TODO: find an example) | #ff99ff |
Match | citeit-punctuation-change | Matches except for punctuation changes: Example: quoted text ends in a comma, but Wikipedia quote uses a period |
“Her articles were important, not because they were radically new but because they helped formulate something that had been inchoate.”[63] “Her articles were important, not because they were radically new but because they helped formulate something that had been inchoate,“ Professor Fox said |
#ff99ff |
Match | citeit-omit-text-from-source | The original source includes text that is not in the citing quote |
“Let me repeat what I have repeated for many months now, I never received nor sent any material that was marked classified.” “Let me just repeat what I have repeated for many months now,” she said in the interview on “Meet the Press.” “I never received nor sent any material that was marked classified” |
#ff99ff |
Match, Feature | citeit-feature-ellipses | The quote is interrupted by ellipses and then later continued |
“There has never been a better time in history to be born a woman … this data shows just how far we still have to go.” |
#ff99ff |
Match | citeit-non-quote | Although Quotation Marks are Used, the Quote is a Title or Term, not a quote. |
Examples: “filegate“, “Hillary Doctrine“
TODO: remove CiteIt link from quote so normal link is visible |
#996600 |
Offline | citeit-offline-no-isbn | A publication that is not available Online without an ISBN |
“<q cite=”” class=”citeit-non-quote citeit-offline-no-isbn”>Children’s Rights: A Legal Perspective</q>” in 1979 |
#3366ff |
Offline | citeit-offline-isbn | A Book that is not available Online but it has an ISBN |
“<q cite=”https://en.wikipedia.org/wiki/Special:BookSources/978-0-8050-9511-1″ class=”citeit-offline-isbn”>pivot to Asia</q>” |
#3366ff |
Private | citeit-paywall | The document requires a subscription |
“<q cite=”https://www.jstor.org/stable/795794″ class=”citeit-paywall citeit-non-quote citeit-later-match”>Children’s Policies: Abandonment and Neglect</q>” |
#666699 |
Google Books ** | google-books | Google Books | Nazca Lines | #6600cc |
Private | citeit-edu | An academic document requires a subscription |
“<q cite”https://meridian.allenpress.com/her/article-abstract/43/4/487/30983/Children-Under-the-Law” class=”citeit-paywall citeit-edu”>Children Under the Law</q>” |
#6600cc |
citeit-twitter | Twitter generates its HTML using javascript, which I hope a future version of CiteIt can handle |
<q cite=”https://thegolfnewsnet.com/golfnewsnetteam/2018/07/14/donald-trump-exercise-golf-cart-turnberry-110166/” class=”citeit-twitter”>primary form of exercise</q> |
#3366ff | |
Archive.org Borrow | citeit-archive-org-borrow | The source is available to borrow electronically through Archive.org |
<q cite=”https://archive.org/details/herwayhopesambit00gert” class=”citeit-archive-org-borrow citeit-later-match”>Hillarycare</q> |
#6699ff |
Wayback Machine | citeit-archive-org | Wayback Machine and other Archive.org that doesn’t require checking out |
<q cite=”https://archive.org/details/herwayhopesambit00gert” class=”citeit-archive-org-borrow citeit-later-match”>Hillarycare</q> |
#33ccff |
Results | citeit-no-context | A Match is found, but no Context is Returned |
“<q cite=”https://time.com/5309425/donald-trump-kim-jong-un-summit-document-full-text/” class=”citeit-no-context”>I’m not going to rule out a military option</q> |
#ff6600 |
Match | citeit-no-match | The Quote was not found in the Cited Source |
<q cite=”http://content.time.com/time/magazine/article/0,9171,2097973,00.html” class=”citeit-no-context”>convening power</q> |
#cc0000 |
Editing | citeit-better-link | An Alternative Source is used Instead because it provides Better Context. Requires creating a new Footnote |
<q cite=”https://www.washingtonpost.com/news/the-fix/wp/2016/07/28/here-is-hillary-clintons-presidential-nomination-acceptance-speech/” class=”citeit-better-link”>words from our Methodist faith</q> |
green |
Best-Practices | citeit-naked-quote |
The Citation is a Quote of a Quote, without the original Context When CiteIt pulls in the context, the context is from the secondary source rather than the quoted source |
If Wikipedia quotes an article in the New York Times, and the New York Times quotes the President. How much of CiteIt’s 500 characters of context is from the New York Times and how much is from the President? A naked quote would not have any context from the primary source (The President). All of the context CiteIt finds would be from the secondary source. |