One of my goals is to “upgrade” all English-language citations to CiteIt’s Contextual Citations.
- loading all Wikipedia articles
- finding all footnotes
- locating the URL corresponding to each footnoted source
- locating the quote corresponding to each footnote that is delimited with quotation marks.
- looking up each URL and determining which URL corresponds to the quote
- wrapping each quote with a <q cite=”URL”>quote</q> tag.
Suggestion: pywikibot #
On the Wikipedia Idea lab site, @Whatamidoing suggested I checkout mw:New Developers site. There I found the Python Pywikibot project.
Here’s an example of using Pywikibot (Git Repository) to load a page and perform a replacement:
Wikibase Usage: Get Page #
Wikibase is a flexible knowledge base software that drives Wikidata. A sample pywikibot script for getting data from Wikibase:
import pywikibot
site = pywikibot.Site('wikipedia:en')
repo = site.data_repository() # the Wikibase repository for given site
page = repo.page_from_repository('Q91') # create a local page for the given item
item = pywikibot.ItemPage(repo, 'Q91') # a repository item
data = item.get() # get all item data from repository for this item
Creating a Custom Page Parsing Bot #
using mwparserfromhell
import pywikibot
from pywikibot import pagegenerators
from pywikibot.bot import ExistingPageBot
class MyBot(ExistingPageBot):
update_options = {
'text': 'This is a test text',
'summary: 'Bot: a bot test edit with Pywikbot.'
}
def treat_page(self):
"""Load the given page, do some changes, and save it."""
text = self.current_page.text
text += '\n' + self.opt.text
self.put_current(text, summary=self.opt.summary)
def main():
"""Parse command line arguments and invoke bot."""
options = {}
gen_factory = pagegenerators.GeneratorFactory()
# Option parsing<a name="references"></a>
local_args = pywikibot.handle_args(args) # global options
local_args = gen_factory.handle_args(local_args) # generators options
for arg in local_args:
opt, sep, value = arg.partition(':')
if opt in ('-summary', '-text'):
options[opt[1:]] = value
MyBot(generator=gen_factory.getCombinedGenerator(), **options).run()
if __name == '__main__':
main()
Iterating over References #
What is the best way to iterate over citation <references>?
Here is what a potential filter would look like:
filter_references(*a, **kw)
"""Iterate over references.<code>
Documentation: Iterating over Tags #
This is what existing filters look like:
wikicode Module
class mwparserfromhell.wikicode.Wikicode(nodes)
Bases: mwparserfromhell.string_mixin.StringMixIn
"""A Wikicode is a container for nodes that operates like a string.
Additionally, it contains methods that can be used to extract data from or modify the nodes, implemented in an
interface similar to a list. For example, index() can get the index of a node in the list, and insert() can
add a new node at that index. The filter() series of functions is very useful for extracting and iterating over,
for example, all of the templates in the object."""
RECURSE_OTHERS = 2
append(value)
"""Insert value at the end of the list of nodes.
value can be anything parsable by parse_anything()."""
contains(obj)
"""Return whether this Wikicode object contains obj.
If obj is a Node or Wikicode object, then we search for it exactly among all of our children, recursively.
Otherwise, this method just uses __contains__() on the string."""
filter(*args, **kwargs)
"""Return a list of nodes within our list matching certain conditions.
This is equivalent to calling list() on ifilter()."""
filter_arguments(*a, **kw)
"""Iterate over arguments.
This is equivalent to filter() with forcetype set to Argument."""
filter_comments(*a, **kw)
"""Iterate over comments.
This is equivalent to filter() with forcetype set to Comment."""
filter_external_links(*a, **kw)
"""Iterate over external_links.
This is equivalent to filter() with forcetype set to ExternalLink."""
filter_headings(*a, **kw)
"""Iterate over headings.
This is equivalent to filter() with forcetype set to Heading."""
filter_html_entities(*a, **kw)
"""Iterate over html_entities.
This is equivalent to filter() with forcetype set to HTMLEntity."""
filter_tags(*a, **kw)
"""Iterate over tags.
This is equivalent to filter() with forcetype set to Tag."""
filter_templates(*a, **kw)
"""Iterate over templates.
This is equivalent to filter() with forcetype set to Template."""
filter_text(*a, **kw)
"""Iterate over text.
This is equivalent to filter() with forcetype set to Text."""
filter_wikilinks(*a, **kw)
"""Iterate over wikilinks.
This is equivalent to filter() with forcetype set to Wikilink."""
get(index)
"""Return the indexth node within the list of nodes."""