I’ve known for a while that a Unicode bug was preventing the javascript CiteIt.net javascript hash function from matching the python hash function.
I did some research into this recently and discovered that javascript uses UTF-16, while my Python code uses UTF-8.
I changed this in the WordPress client code by converting the javascript string to UTF-8:
Code
Here’s the code:
//** Javascript uses utf-16. Convert to utf-8 **
var hash_key = quoteHashKey(
citing_quote,
citing_url,
cited_url
);
hash_key = encode_utf8(hash_key);
// *** Convert string to UTF-8 ***
function encode_utf8( s ) {
return unescape( encodeURIComponent( s ) );
}
Example:
Here’s an example of a special character than now displays after the fix: ü
Two recently published books—one by Ian Milligan (2019) and one edited by Niels Brügger and Ralph Schroeder (2017)—provide essential guides to help answer the question of what web archives are by describing concrete, nonhypothetical examples of how social science and humanities researchers are using web archives today. For those who have participated in web archiving activity and pondered how the records would get used, and for those who are looking to get involved in web archiving but are not sure what it takes, these two books are essential reading.
Webservice Result
Here’s the JSON file that results from calling the webservice.