pyelucidate package¶
Submodules¶
pyelucidate.pyelucidate module¶
-
annotation_pages(result: Optional[dict]) → Optional[str][source]¶ Generator which yields URLs for annotation pages from an Activity Streams paged result set. Works by looking for the “last” page in the paged result set and incrementing between 0 and last.
Does not request each page and examine “next” or “previous”.
For example, given an Activity Streams paged result set which contains:
{"last": "https://elucidate.example.org/annotation/w3c/services/search/body?page=3&fields =source&value=FOO&desc=1"}
Will yield:
Parameters: result – Activity Streams paged result set Returns: Activity Streams page URIs.
-
async_items_by_container(elucidate: str, container: Optional[str] = None, target_uri: Optional[str] = None, header_dict: Optional[dict] = None, **kwargs) → Optional[dict][source]¶ Asynchronously yield annotations from a query by container to Elucidate.
Container can be hashed from target URI, or provided
Parameters: - elucidate – Elucidate server, e.g. https://elucidate.example.org
- target_uri – URI from target source and id, e.g. ‘https://manifest.example.org/manifest/1’
- container – container path
- header_dict – dict of headers
Returns: annotation object
-
async_items_by_creator(elucidate: str, creator_id: str, **kwargs) → dict[source]¶ Asynchronously yield annotations from a query by creator to Elucidate.
Async requests all of the annotation pages before yielding.
Parameters: - elucidate – Elucidate server, e.g. https://elucidate.example.org
- creator_id – URI from target source and id, e.g. ‘https://manifest.example.org/manifest/1’
Returns: annotation object
-
async_items_by_target(elucidate: str, target_uri: str, **kwargs) → dict[source]¶ Asynchronously yield annotations from a query by topic to Elucidate.
Async requests all of the annotation pages before yielding.
Parameters: - elucidate – Elucidate server, e.g. https://elucidate.example.org
- target_uri – URI from target source and id, e.g. ‘https://manifest.example.org/manifest/1’
Returns: annotation object
-
async_items_by_topic(elucidate: str, topic: str, **kwargs) → dict[source]¶ Asynchronously yield annotations from a query by topic to Elucidate.
Does an asynchronous get for all the annotations, and then yields the annotations with optional transformation provided by the “trans_function” arg.
Parameters: - elucidate – Elucidate server, e.g. https://elucidate.example.org
- topic – URI from body source, e.g. ‘https://topics.example.org/people/mary+jones’
Returns: annotation object
-
async_manifests_by_topic(elucidate: str, topic: Optional[str] = None) → Optional[list][source]¶ Asynchronously fetch the results from a topic query to Elucidate and yield manifest URIs
N.B. assumption, if passed a string for target, rather than an object, that manifest and canvas URI patterns follow old API DLCS/Presley model.
Parameters: - elucidate – URL for Elucidate server, e.g. https://elucidate.example.org
- topic – URL for body source, e.g. https://topics.example.org/people/mary+jones
Returns: manifest URI
-
batch_delete_target(target_uri: str, elucidate_uri: str, dry_run: bool = True) → int[source]¶ Use Elucidate’s batch delete API to delete everything with a given target id or target source URI.
https://github.com/dlcs/elucidate-server/blob/master/USAGE.md#batch-delete
Parameters: - target_uri – URI to delete
- elucidate_uri – URI of the Elucidate server, e.g. https://elucidate.example.org
- dry_run – if True, do not actually delete, just log request and return a 200
Returns: status code
-
batch_delete_topic(topic_id: str, elucidate_base: str, dry_run: bool = True) → Tuple[int, str][source]¶ Use Elucidate’s batch update apis to delete all instances of a topic URI.
https://github.com/dlcs/elucidate-server/blob/master/USAGE.md#batch-delete
Parameters: - topic_id – topic id to delete
- elucidate_base – elucidate base URI, e.g. https://elucidate.example.org
- dry_run – if True, will simply log and then return a 200
Returns: tuple - http POST status code, JSON POSTed (as string)
-
batch_update_body(new_topic_id: str, old_topic_ids: list, elucidate_base: str, dry_run: bool = True) → Tuple[int, dict][source]¶ Use Elucidate’s bulk update APIs to replace all instances of each of a list of body source or id URIs (aka a topic) with the new URI (aka topic).
https://github.com/dlcs/elucidate-server/blob/master/USAGE.md#batch-update
Parameters: - new_topic_id – topic ids to use, string
- old_topic_ids – topic ids to replace, list
- elucidate_base – elucidate base URI, e.g. https://elucidate.example.org
- dry_run – if True, will simply log JSON and URI and then return a 200
Returns: POST status code
-
create_anno(elucidate_base: str, annotation: dict, target: Optional[str] = None, container: Optional[str] = None, model: Optional[str] = 'w3c') → Tuple[int, Optional[str]][source]¶ POST an annotation to Elucidate, can be optionally passed a container, if container is None will use the MD5 hash of the manifest or canvas target URI as the container name.
If no @context is provided, the code will insert the appropriate context based on the model.
Parameters: - elucidate_base – base URI for the annotation server, e.g. https://elucidate.example.org
- target – target for the annotation (optional), will attempt to parse anno for target if not present
- annotation – annotation object
- container – container name (optional), will use hash of target uri if not present
- model – oa or w3c
Returns: status code from Elucidate, annotation id (or none)
-
create_container(container_name: str, label: str, elucidate_uri: str) → int[source]¶ Create an annotation container with a container name and label.
Parameters: - container_name – name of the container
- label – label for the container
- elucidate_uri – uri for the annotation server, including full path, e.g. https://elucidate.example.org/annotation/w3c/
Returns: POST request status code
-
delete_anno(anno_uri: str, etag: str, dry_run: bool = True) → int[source]¶ Delete an individual annotation, requires etag.
Optionally, can be run as a dry run which will not delete the annotation.
Parameters: - anno_uri – URI for annotation
- etag – ETag
- dry_run – if True, log and return a 204
Returns: return DELETE request status code
-
fetch(url: str, session: aiohttp.client.ClientSession) → dict[source]¶ Asynchronously fetch a url, using specified ClientSession.
-
fetch_all(urls: list, connector_limit: int = 5) → _asyncio.Future[source]¶ Launch async requests for all web pages in list of urls.
Parameters: - urls – list of URLs to fetch
- connector_limit – integer for max parallel connections
:return results from requests
-
format_results(annotation_list: Optional[list], request_uri: str) → Optional[dict][source]¶ Takes a list of annotations and returns as a standard Presentation API Annotation List.
Parameters: - annotation_list – list of annotations
- request_uri – the URI to use for the @id
:return dict or None
-
gen_search_by_container_uri(elucidate_base: str, target_uri: Optional[str], model: str = 'w3c') → Optional[str][source]¶ Return the annotation container uri for a target. Assumes that the container URI is an md5 hash of the target URI (as per current DLCS general practice).
This URI can be passed to other functions to return the result of the query.
Parameters: - elucidate_base – base URI for the annotation server, e.g. https://elucidate.example.org
- target_uri – target URI to search for, e.g. IIIF Presentation API manifest or canvas URI
- model – oa or w3c
Returns: uri
-
gen_search_by_target_uri(target_uri: Optional[str], elucidate_base: str, model: str = 'w3c', field=None) → Optional[str][source]¶ Returns a search URI for searching Elucidate for a target using Elucidate’s basic search API.
This URI can be passed to other functions to return the result of the query.
Parameters: - model – oa or w3c, defaults to w3c.
- elucidate_base – base URI for the annotation server, e.g. https://elucidate.example.org
- target_uri – target URI to search for, e.g. a IIIF Presentatiion API canvas or manifest
URI :param field: list of fields to search on, defaults to both source and id :return: uri
-
get_items(uri: str) → Optional[dict][source]¶ Page through an ActivityStreams paged result set, yielding each page’s items one at a time.
Parameters: uri – Request URI, e.g. provided by gen_search_by_target_uri() Returns: item
-
identify_target(annotation_content: dict) → Optional[str][source]¶ Identify the base level target for an annotation, for
output
If the annotation has multiple targets, return just base level target for the first.
Parameters: annotation_content – annotation dict Returns: uri
-
iiif_batch_delete_by_manifest(manifest_uri: str, elucidate_uri: str, dry_run: bool = True) → bool[source]¶ Provides a IIIF aware wrapper around the _batch_delete_by_target_ function. Requests a IIIF Presentation API manifest and deletes all of the annotations with the canvas or the manifest URIs as their target.
Use Elucidate’s batch delete API to delete everything with a given target id or target source URI.
https://github.com/dlcs/elucidate-server/blob/master/USAGE.md#batch-delete
Parameters: - manifest_uri – URI of IIIF Presentation API manifest (must be de-referenceable)
- elucidate_uri – base URI for Elucidate, e.g. https://elucidate.example.org
- dry_run – if True, will not actually delete the content
Returns: boolean for status, True if no errors, False if error on any delete operation.
-
iiif_iterative_delete_by_manifest(manifest_uri: str, elucidate_uri: str, method: str = 'search', dry_run: bool = True) → bool[source]¶ Provides a IIIF aware wrapper around the iterative_delete_by_target function.
Iteratively delete all annotations for every canvas in a IIIF Presentation manifest and for the IIIF Presentation API manifest itself.
Requests annotations either by container or by target URI and iteratively deletes the annotations by id, one at a time, using HTTP DELETE.
Does not use Elucidate’s batch delete APIs.
Parameters: - dry_run – if True, will not actually delete
- method – identify the annotations to delete via container (hash) or search (Elucidate
query) :param manifest_uri: URI for IIIF Presentation API manifest. :param elucidate_uri: Elucidate base URI, e.g. https://elucidate.example.org :return: boolean success or fail
-
iiif_iterative_delete_by_manifest_async_get(manifest_uri: str, elucidate_uri: str, dry_run: bool = True) → bool[source]¶ Delete all annotations for every canvas in a IIIF manifest and for the manifest.
Uses asynchronous code to parallel get the search results to build the annotation list.
N.B. does NOT do an async DELETE. Delete is sequential.
Parameters: - dry_run – if True, will not actually delete, just prints URIs
- manifest_uri – uri for IIIF manifest
- elucidate_uri – Elucidate base uri
Returns: boolean success or fail
-
item_ids(item: dict) → Optional[str][source]¶ Small helper function to yield identifier URI(s) for item from an Activity Streams item. Will yield both ‘@id’ and ‘id’ values.
Parameters: item – Item from an activity streams page Returns: uri
-
items_by_body_source(elucidate: str, topic: str, strict: bool = True) → dict[source]¶ Generator to yield annotations from query to Elucidate by body source.
For example, for a W3C web annotation, with body:
{"body": [ { "type": "SpecificResource", "format": "application/html", "creator": "https://montague.example.org/", "generator": "https://montague.example.org//nlp/", "purpose": "tagging", "source": "https://www.example.org/themes/foo" } ]}
This function will query Elucidate for all annotations with body id or body source == “https://www.example.org/themes/foo”.
If strict = False, this would match both:
and
If strict = True, only annotations with an exact match on the body source will be returned.
Parameters: - elucidate – URL for Elucidate server, e.g. https://elucidate.example.org
- topic – URI for body source, e.g. https://www.example.org/themes/foo
- strict – if strict, use strict = True.
Returns: annotation dict
-
iterative_delete_by_target(target: str, elucidate_base: str, search_method: str = 'container', dryrun: bool = True) → bool[source]¶ Delete all annotations in a container for a target URI. Works by querying for the annotations and then iteratively deleting them one at a time.
Note, that this is _not_ an operation using Elucidate’s batch delete APIs.
Negative: could be slow, and involve many consecutive HTTP requests
Positive: as the code is handling the annotations one at a time, it will not time out with very large result sets.
The function can build the list of annotations to delete using either:
the Elucidate search by target API,
or a hash of the target URI to get a container URI.
N.B. choosing the container method assumes that container ID as an MD5 hash of the target URI.
Parameters: - dryrun – if True, will not actually delete, just logs and returns True (for success)
- search_method – ‘container’ (hash of target URI) or ‘search’ (Elucidate query by target)
- target – target URI
- elucidate_base – base URI for Elucidate, e.g. https://elucidate.example.org
Returns: boolean success or fail, True if no errors on _any_ request.
-
iterative_delete_by_target_async_get(target: str, elucidate_base: str, dryrun: bool = True) → bool[source]¶ Delete all annotations in a container for a target uri. Works by querying for the annotations and then iteratively deleting them one at a time. Not a bulk delete operation using Elucidate’s bulk APIs.
N.B. Negative: could be slow, and involve many HTTP requests, Positive: doesn’t really matter how big the result set is, it won’t time out, as handling the annotations one at a time.
Asynchronous query using the Elucidate search by target API to fetch the list of annotations to delete.
DELETE is not asychronous, but sequential.
Parameters: - dryrun – if True, will not actually delete, just logs and returns True (for success)
- target – target uri
- elucidate_base – base URI for Elucidate, e.g. https://elucidate.example.org
Returns: boolean success or fail, True if no errors on _any_ request.
-
mirador_oa(w3c_body: dict) → dict[source]¶ Transform a single W3C Web Annotation Body (e.g. as produced by Montague) and returns formatted for Open Annotation in the Mirador client.
Parameters: w3c_body – annotation body Returns: transformed annotation body
-
parent_from_annotation(content: dict) → Optional[str][source]¶ Parse W3C web annotation and attempt to yield URI for parent object the annotation target is part of.
A typical use would be to return the parent IIIF Presentation API manifest URI for an annotation on a IIIF Presentation API canvas or fragment of a canvas.
The code makes the assumption that, if passed a string for target, rather than an object, that manifest and canvas URI patterns follow the model used by the RESTful DLCS API model.
On this pattern, a canvas with URI:
will have a parent manifest with URI:
This assumption may not, and probably will not, hold for other sources.
If the annotation has a “dcterms:isPartOf” field within the target, the value of “dcterms:isPartOf” will be returned. If there are a list of annotation targets, the first parent will be returned.
Parameters: content – annotation object Returns: target parent URI
-
parents_by_topic(elucidate: str, topic: str) → Optional[str][source]¶ Generator parses results from an Elucidate topic search request, and yields parent/manifest URIs.
The code makes the assumption that, if passed a string for target, rather than an object, that manifest and canvas URI patterns follow the model used by the RESTful DLCS API model.
On this pattern, a canvas with URI:
will have a parent manifest with URI:
This assumption may not, and probably will not, hold for other sources.
Parameters: - elucidate – URL for Elucidate server, e.g. https://elucidate.example.org
- topic – URL for body source, e.g. https://topics.example.org/people/mary+jones
Returns: manifest URI
-
read_anno(anno_uri: str) -> (typing.Union[str, NoneType], typing.Union[str, NoneType])[source]¶ GET an annotation from Elucidate, returns a tuple of annotation content and ETag
Parameters: anno_uri – URI for annotation Returns: annotation content, etag
-
remove_keys(d: dict, keys: list) → dict[source]¶ Remove keys from a dictionary.
Parameters: - d – dict to edit
- keys – list of keys to remove
Returns: dict with keys removed
-
set_query_field(url: str, field: str, value: Union[int, str], replace: bool = False)[source]¶ Parse out the different parts of a URL, and optionally replace a query string parameter, before return the unparsed new URL.
Parameters: - url – URL to parse
- field – field where the value should be replaced
- value – replacement value
- replace – boolean, if True, replace query string parameter
Returns: unparsed URL
-
target_extract(json_dict: dict, fake_selector: bool = False) → Optional[str][source]¶ Extract the target and turn into a simple ‘on’.
Optionally, fake a selector, e.g. for whole canvas annotations, generate a target XYWH bounding box at top left.
Parameters: - fake_selector – if True, create a top left 50px box and associate with that.
- json_dict – annotation content as dictionary
Returns: string for the target URI
-
transform_annotation(item: dict, flatten_at_ids: bool = True, transform_function: Optional[Callable] = None) → Optional[dict][source]¶ Transform an annotation given an arbitrary function that is passed in.
For example, W3C to OA using “mirador_oa”.
The function will remove keys not used in the Open Annotation model.
If no transform_function is provided the annotation will be returned unaltered.
Parameters: - item – annotation
- flatten_at_ids – if True replace @id dict with simple “@id” : “foo”
- transform_function – function to pass the annotation through
Returns:
-
update_anno(anno_uri: str, anno_content: dict, etag: str, dry_run: bool = True) → int[source]¶ Update an individual annotation, requires etag.
Optionally, can be run as a dry run which will not update the annotation but will return a 200.
Parameters: - anno_uri – URI for annotation
- anno_content – the annotation content
- etag – ETag
- dry_run – if True, log and return a 200
Returns: return PUT request status code