Today I fixed a bug in the URL class that prevented the display of quote context for sources that were cited more than once per post.
This was caused by a bad data-structure decision: using a dictionary, rather than a list of dictionaries.
One additional benefit of this change is that citations are de-duplicated so that sources are only requested once, rather than clobbered with multiple requests.
Here’s how duplicates are detected:
def <a href="https://github.com/CiteIt/citeit-webservice/blob/master/app/lib/citeit_quote_context/url.py#LC109:~:text=def%20citations_list_duplicates">citations_list_duplicates</a>(self):
from collections import Counter
duplicate_counter = {}
duplicate_urls = []
urls = self.citation_urls()
duplicate_counter = Counter(urls)
for cited_url, count in duplicate_counter.items():
if count > 1:
duplicate_urls.append(cited_url)
return duplicate_urls
Here’s how duplicate URLs are pre-fetched and cached:
def <a href="https://github.com/CiteIt/citeit-webservice/blob/master/app/lib/citeit_quote_context/url.py#LC127:~:text=def%20citations,-(">citations</a>(self):
'''Pre-fetch and cache URL
if it is found in more than one quote
This prevents sources from being clobbered
with multiple requests in parallel
'''
for url in self.citations_list_duplicates():
d = Document(url)
d.download_resource() # request and cache result
This was Issue #26 for the web-service project, posted to GitHub.
- Here’s a test post in which a test article is posted with multiple quotes from the same article.