Building trust in media

Wikipedia Examples

Mark asked me to put together a more complete collection of Wikipedia example articles.

These are a few of the articles I’ve marked up, with the goal of automating the process of converting existing quotes to Contextual Citations:

Ruth Bader Ginsburg

Ruth Bader Ginsburg: Example Wikipedia article

People/Groups

  1. Ruth Bader Ginsburg
  2. Hillary Clinton **
  3. Donald Trump **

Literature

  1. Pride and Prejudice
  2. Hamlet

History

  1. Inauguration of John F. Kennedy
  2. 2000_MI6_attack

Science

  1. Manned Orbiting Laboratory
  2. Syphilis

 

** These articles have been completely marked up, including Books that are not available online.

 

Converting to Contextual Citations

Here’s how I imagine CiteIt.net’s Contextual Citations could be integrated with Wikipedia:

Phase 1: Manual Editing

In phase 1, citations from sample articles could be manually marked up.  This trial-run would most likely start with less high profile pages. If this were successful, Wikipedia Editors could be given the ability to mark up citations through the editor and citations could be indexed, either manually or automatically upon publication. Since most people would not know about the ability to create contextual citations using the Wikipedia editor, this phase would also be fairly low-profile.

Improving the accuracy of the returned CiteIt Context is a precondition for greater adoption.

  1. The context returned by the web service needs to be more accurate (I’ve cataloged some of the bugs below).
  2. The number of “misses” where the web service incorrectly fails to find the context needs to be reduced.
  3. The speed and scalability of the web service need to be improved.
    • Right now it takes hours to process the Donald Trump article.  The Donald Trump and Hillary Clinton articles were chosen as sample articles because they has a high number of web citations.

Phase 2: Automatic Conversion:

My friend Bryan said that it should be possible to retroactively go back and automatically convert all of Wikipedia quotes to use CiteIt’s Contextual Citations, assuming the Pareto principle that roughly 80% of the citations could be converted with a script following some straight-forward rules and the remaining citations would have to be manually processed.

With that in mind, I set out over the past month to do a fairly thorough analysis of the types of issues that we could encounter if we chose to automate the conversion. As part of the process, I would hope to create a database of citations to be “upgraded” and an interface for reporting and fixing bugs

From this database, we could both:

  • Analyze the accuracy of the program that converts quotes to Contextual Citations
  • Capture the human corrections to the program’s output

Machine Learning Option

If someone wanted to automate this process further, this process might be a good candidate for machine learning but the Wikipedia community’s philosophy towards automation and error-handling would determine how the technology is developed.

 

Analyzing Errors:

I did a more thorough job of analyzing the Hillary Clinton and Donald Trump articles, going through every quotation in the article (excluding the references at the end of the article) and marking each quote up with a q-tag and CSS classes, indicating the reason why the citation couldn’t be properly matched.

You can see highlights of all the errors if you click on the blueShow Citation Errors” button in the upper-right corner of the yellow header:

Screenshot: Show Citation Errors

 

Error Codes:

Below is a list of the CSS error classes created when marking up sample Wikipedia articles.

Category Class Description Example Color
Error citeit-automation-error An automated bot would likely pull an inappropriate match should we return multiple matches and return all results to editors or readers?

Quote:

Trump favors neutral or positive balances of trade over negative balances of trade, also known as a trade deficit”. Trump adopted his current skeptical views 

Source:

“Share of total U.S. merchandise trade deficit by country”

Explanation:  Source Returns the first result from the first footnote’s source.  But this is a legend title rather than the actual article body. It would be preferable to use a latter match.  Perhaps a UI could be built that displays all matched results and gives Wikipedia editors the ability to chose the preferred instance.

red
Error citeit-error-context Context is Returned, but Incorrect Example:  convention’s veiled” racist messages #ff3399
Error citeit-error-quote-returned Error in Returned Quote

Quote:

Clinton asserted President Trump’s 2018 budget proposal was “a con” for underfunding domestic programsReturned: Clinton called Mr. Trump’s 2018 budget proposal ” a cong “ which she said would underfund public education

red
Error citeit-error-quote-context-edges Slight character errors in Surrounding Context

Example:

“adding that her mother Dorothy “made sure I learned [these] words from our Methodist faith” Quote: “And she made sure I learned [these] ds from our Methodist faith”

red
Error citeit-error-unknown Error of Unknown Type

Example:

Emoluments Clause as phony.

HTML: Emoluments Clause as <q cite=”https://www.usatoday.com/story/news/politics/2019/10/21/donald-trump-mocks-constitution-emoluments-clause-phony/4055162002/” class=”citeit-error-unknown”>“phony”</q>

red
Error citeit-error-404 Source URL returns 404 error

Example

Clinton called for a constitutional amendment to limit “unaccountable money

red
PDF citeit-pdf-scanned Source Document is a PDF image that needs to be scanned with OCR

Example

memorandum saying “the data indicates that the President remains healthy. HTML:   memorandum saying “<q cite=”https://media.arkansasonline.com/news/documents/2020/06/03/Trump_Physical_Exam.pdf” class=”citeit-pdf-scanned”>the data indicates that the President remains healthy.

#0099ff
Match citeit-footnote-interspersed There is a footnote insterspersed in the middle of the quote that could throw off the match

Example:

Correct and consistent use of latex condoms can reduce the risk of syphilis only when the infected area or site of potential exposure is protected.[41] However, a syphilis sore outside of the area

red
Match citeit-footnote-shortname Footnote is a short form of citation. Need to cross-reference the short name with the longer reference in the references cited section.

The footnote is listed in the short form: Lastname, page

Example:

Troy 2006, pp. 176–77

References cited: Troy, Gil (2006). Hillary Rodham Clinton: Polarizing First Lady. Lawrence, Kansas: University Press of Kansas. ISBN 978-0-7006-1488-2.

red
Wiki citeit-footnote-later The match is not found in the first footnote, but in a second footnote.

Quote:

“In July 2016, she “committed” to introducing a U.S. constitutional amendment” 1st Source: not found 1st Source URL: http://www.cnn.com/2016/07/16/politics/hillary-clinton-campaign-finance/2nd Source: Hillary Clinton committed Saturday 2nd Source URL: https://www.politico.com/story/2016/07/hillary-clinton-citizens-united-225658

#cc6600
YouTube youtube-video The source is a YouTube URL Example:  “The Hillary Shimmy Song”. September 28, 2016. Retrieved September 16, 2017 – via YouTube. #3366ff
Wiki wiki-legend The Quote is found in the Legend of Wikipedia Image and the footnote may be before the quote

Example:

“Fact-checkers from The Washington Post,[839] the Toronto Star,[840] and CNN[841] compiled data on “false or misleading claims” (orange background), and “false claims” (violet foreground), respectively.”

#3366ff
Wiki wiki-note Internal Wikipedia Note

Example

Clinton into “imaginary discussions” with the also-politically active Eleanor Roosevelt.[f]

Notes:

 

f. The Eleanor Roosevelt “discussions” were first reported in 1996 by The Washington Post writer Bob Woodward; they had begun from the start of Hillary Clinton’s time as first lady.[154]

#3366ff
Wiki wiki-multiple-source Wikipedia Source Citation Record Contains Multiple Sources

Example:

Calabresi, Massimo (November 7, 2011). “Hillary Clinton and the Rise of Smart Power”. Time. pp. 26–31. See also “TIME magazine editor explains Hillary Clinton’s ‘smart power'”. CNN. October 28, 2011.Wikipedia article: Hillary Clinton

#3366ff
Match citeit-numbers-written-out The match is not found because Quoted text writes number out as text rather than numbers

Wikipedia:

to a willingness “to remold society by redefining what it means to be a human being in the twentieth century, moving into a new millennium Source: “Let us be willing,” she urged in conclusion, “to remold society by redefining what it means to be a human being in the 20th century, moving into a new millennium.”

#3366ff
Match citeit-text-from-source Quote text needed to be replaced from the source

Replaced Quote

“can’t .. miss” Wikipedia: (Outdated): “can’t miss”

Source:

“can’t afford to miss”

#3366ff
Match, Feature citeit-later-match A later match (2nd or 3rd) would be preferable

Example:

“genocidal taunts”

Source: Quote found in the title, but better context is found in 2nd match in the article body.

#cc6600
Match, Feature citeit-feature-added-word A word is added to the quote using brackets:

Example:

[although] “we did not find clear evidence that Secretary Clinton or her colleagues intended to violate laws “

#cc6600
Match citeit-formatting-mismatch Looks like it matches, but doesn’t because of a formatting mismatch.

Example

“contends that they are not shapes of constellations but of what might be called <i>counter constellations</i>, the irregular-shaped dark patches within the twinkling expanse of the <a href=”https://en.wikipedia.org/wiki/Milky_Waytitle=”Milky Way“>Milky Way</a>”

#ff99ff
Match citeit-change-case Matches except for changing from upper to lower case or vice versa

Example

[T]he mudslides and heavy rains did not appear to have caused any significant damage to the Nazca Lines

#ff99ff
Match citeit-hyphen-change Matches except for changing hyphenization Nazca Lines  (TODO: find an example) #ff99ff
Match citeit-punctuation-change Matches except for punctuation changes: Example: quoted text ends in a comma, but Wikipedia quote uses a period

Live Example:

“Her articles were important, not because they were radically new but because they helped formulate something that had been inchoate.”[63]

Source:

Her articles were important, not because they were radically new but because they helped formulate something that had been inchoate, Professor Fox said

#ff99ff
Match citeit-omit-text-from-source The original source includes text that is not in the citing quote

Quote:

“Let me repeat what I have repeated for many months now, I never received nor sent any material that was marked classified.”

Original:

“Let me just repeat what I have repeated for many months now,” she said in the interview on “Meet the Press.” “I never received nor sent any material that was marked classified”

#ff99ff
Match, Feature citeit-feature-ellipses The quote is interrupted by ellipses and then later continued

Quote:

“There has never been a better time in history to be born a woman … this data shows just how far we still have to go.”

#ff99ff
Match citeit-non-quote Although Quotation Marks are Used, the Quote is a Title or Term, not a quote.

 Examples:

filegate“, “Hillary Doctrine

 

TODO: remove CiteIt link from quote so normal link is visible

#996600
Offline citeit-offline-no-isbn A publication that is not available Online without an ISBN

Example:

“<q cite=”” class=”citeit-non-quote citeit-offline-no-isbn”>Children’s Rights: A Legal Perspective</q>” in 1979

#3366ff
Offline citeit-offline-isbn A Book that is not available Online but it has an ISBN

Example:

“<q cite=”https://en.wikipedia.org/wiki/Special:BookSources/978-0-8050-9511-1″ class=”citeit-offline-isbn”>pivot to Asia</q>”

#3366ff
Private citeit-paywall The document requires a subscription

Example:

“<q cite=”https://www.jstor.org/stable/795794″ class=”citeit-paywall citeit-non-quote citeit-later-match”>Children’s Policies: Abandonment and Neglect</q>”

#666699
Google Books ** google-books Google Books Nazca Lines #6600cc
Private citeit-edu An academic document requires a subscription

Example:

“<q cite”https://meridian.allenpress.com/her/article-abstract/43/4/487/30983/Children-Under-the-Law” class=”citeit-paywall citeit-edu”>Children Under the Law</q>”

#6600cc
Twitter citeit-twitter Twitter generates its HTML using javascript, which I hope a future version of CiteIt can handle

Example:

<q cite=”https://thegolfnewsnet.com/golfnewsnetteam/2018/07/14/donald-trump-exercise-golf-cart-turnberry-110166/” class=”citeit-twitter”>primary form of exercise</q>

#3366ff
Archive.org Borrow citeit-archive-org-borrow The source is available to borrow electronically through Archive.org

Example:

<q cite=”https://archive.org/details/herwayhopesambit00gert” class=”citeit-archive-org-borrow citeit-later-match”>Hillarycare</q>

#6699ff
Wayback Machine citeit-archive-org Wayback Machine and other Archive.org that doesn’t require checking out

Example:

<q cite=”https://archive.org/details/herwayhopesambit00gert” class=”citeit-archive-org-borrow citeit-later-match”>Hillarycare</q>

#33ccff
Results citeit-no-context A Match is found, but no Context is Returned

Example:

“<q cite=”https://time.com/5309425/donald-trump-kim-jong-un-summit-document-full-text/” class=”citeit-no-context”>I’m not going to rule out a military option</q>

#ff6600
Match citeit-no-match The Quote was not found in the Cited Source

Example:

<q cite=”http://content.time.com/time/magazine/article/0,9171,2097973,00.html” class=”citeit-no-context”>convening power</q>

#cc0000
Editing citeit-better-link An Alternative Source is used Instead because it provides Better Context. Requires creating a new Footnote

Example:

<q cite=”https://www.washingtonpost.com/news/the-fix/wp/2016/07/28/here-is-hillary-clintons-presidential-nomination-acceptance-speech/” class=”citeit-better-link”>words from our Methodist faith</q>

green
Best-Practices citeit-naked-quote

The Citation is a Quote of a Quote, without the original Context

When CiteIt pulls in the context, the context is from the secondary source rather than the quoted source

If Wikipedia quotes an article in the New York Times, and the New York Times quotes the President.

How much of CiteIt’s 500 characters of context is from the New York Times and how much is from the President?

A naked quote would not have any context from the primary source (The President).  All of the context  CiteIt finds would be from the secondary source.

 
Click to access the login or register cheese