pyelucidate package

Submodules

pyelucidate.pyelucidate module

annotation_pages(result: Optional[dict]) → Optional[str][source]

Generator which yields URLs for annotation pages from an Activity Streams paged result set. Works by looking for the “last” page in the paged result set and incrementing between 0 and last.

Does not request each page and examine “next” or “previous”.

For example, given an Activity Streams paged result set which contains:

{"last": "https://elucidate.example.org/annotation/w3c/services/search/body?page=3&fields
=source&value=FOO&desc=1"}

Will yield:

https://elucidate.example.org/annotation/w3c/services/search/body?fields=source&value=FOO&desc=1&page=0

https://elucidate.example.org/annotation/w3c/services/search/body?fields=source&value=FOO&desc=1&page=1

https://elucidate.example.org/annotation/w3c/services/search/body?fields=source&value=FOO&desc=1&page=2

https://elucidate.example.org/annotation/w3c/services/search/body?fields=source&value=FOO&desc=1&page=3

Parameters:result – Activity Streams paged result set
Returns:Activity Streams page URIs.
async_items_by_container(elucidate: str, container: Optional[str] = None, target_uri: Optional[str] = None, header_dict: Optional[dict] = None, **kwargs) → Optional[dict][source]

Asynchronously yield annotations from a query by container to Elucidate.

Container can be hashed from target URI, or provided

Parameters:
Returns:

annotation object

async_items_by_creator(elucidate: str, creator_id: str, **kwargs) → dict[source]

Asynchronously yield annotations from a query by creator to Elucidate.

Async requests all of the annotation pages before yielding.

Parameters:
Returns:

annotation object

async_items_by_target(elucidate: str, target_uri: str, **kwargs) → dict[source]

Asynchronously yield annotations from a query by topic to Elucidate.

Async requests all of the annotation pages before yielding.

Parameters:
Returns:

annotation object

async_items_by_topic(elucidate: str, topic: str, **kwargs) → dict[source]

Asynchronously yield annotations from a query by topic to Elucidate.

Does an asynchronous get for all the annotations, and then yields the annotations with optional transformation provided by the “trans_function” arg.

Parameters:
Returns:

annotation object

async_manifests_by_topic(elucidate: str, topic: Optional[str] = None) → Optional[list][source]

Asynchronously fetch the results from a topic query to Elucidate and yield manifest URIs

N.B. assumption, if passed a string for target, rather than an object, that manifest and canvas URI patterns follow old API DLCS/Presley model.

Parameters:
Returns:

manifest URI

batch_delete_target(target_uri: str, elucidate_uri: str, dry_run: bool = True) → int[source]

Use Elucidate’s batch delete API to delete everything with a given target id or target source URI.

https://github.com/dlcs/elucidate-server/blob/master/USAGE.md#batch-delete

Parameters:
  • target_uri – URI to delete
  • elucidate_uri – URI of the Elucidate server, e.g. https://elucidate.example.org
  • dry_run – if True, do not actually delete, just log request and return a 200
Returns:

status code

batch_delete_topic(topic_id: str, elucidate_base: str, dry_run: bool = True) → Tuple[int, str][source]

Use Elucidate’s batch update apis to delete all instances of a topic URI.

https://github.com/dlcs/elucidate-server/blob/master/USAGE.md#batch-delete

Parameters:
  • topic_id – topic id to delete
  • elucidate_base – elucidate base URI, e.g. https://elucidate.example.org
  • dry_run – if True, will simply log and then return a 200
Returns:

tuple - http POST status code, JSON POSTed (as string)

batch_update_body(new_topic_id: str, old_topic_ids: list, elucidate_base: str, dry_run: bool = True) → Tuple[int, dict][source]

Use Elucidate’s bulk update APIs to replace all instances of each of a list of body source or id URIs (aka a topic) with the new URI (aka topic).

https://github.com/dlcs/elucidate-server/blob/master/USAGE.md#batch-update

Parameters:
  • new_topic_id – topic ids to use, string
  • old_topic_ids – topic ids to replace, list
  • elucidate_base – elucidate base URI, e.g. https://elucidate.example.org
  • dry_run – if True, will simply log JSON and URI and then return a 200
Returns:

POST status code

create_anno(elucidate_base: str, annotation: dict, target: Optional[str] = None, container: Optional[str] = None, model: Optional[str] = 'w3c') → Tuple[int, Optional[str]][source]

POST an annotation to Elucidate, can be optionally passed a container, if container is None will use the MD5 hash of the manifest or canvas target URI as the container name.

If no @context is provided, the code will insert the appropriate context based on the model.

Parameters:
  • elucidate_base – base URI for the annotation server, e.g. https://elucidate.example.org
  • target – target for the annotation (optional), will attempt to parse anno for target if not present
  • annotation – annotation object
  • container – container name (optional), will use hash of target uri if not present
  • model – oa or w3c
Returns:

status code from Elucidate, annotation id (or none)

create_container(container_name: str, label: str, elucidate_uri: str) → int[source]

Create an annotation container with a container name and label.

Parameters:
Returns:

POST request status code

delete_anno(anno_uri: str, etag: str, dry_run: bool = True) → int[source]

Delete an individual annotation, requires etag.

Optionally, can be run as a dry run which will not delete the annotation.

Parameters:
  • anno_uri – URI for annotation
  • etag – ETag
  • dry_run – if True, log and return a 204
Returns:

return DELETE request status code

fetch(url: str, session: aiohttp.client.ClientSession) → dict[source]

Asynchronously fetch a url, using specified ClientSession.

fetch_all(urls: list, connector_limit: int = 5) → _asyncio.Future[source]

Launch async requests for all web pages in list of urls.

Parameters:
  • urls – list of URLs to fetch
  • connector_limit – integer for max parallel connections

:return results from requests

format_results(annotation_list: Optional[list], request_uri: str) → Optional[dict][source]

Takes a list of annotations and returns as a standard Presentation API Annotation List.

Parameters:
  • annotation_list – list of annotations
  • request_uri – the URI to use for the @id

:return dict or None

gen_search_by_container_uri(elucidate_base: str, target_uri: Optional[str], model: str = 'w3c') → Optional[str][source]

Return the annotation container uri for a target. Assumes that the container URI is an md5 hash of the target URI (as per current DLCS general practice).

This URI can be passed to other functions to return the result of the query.

Parameters:
  • elucidate_base – base URI for the annotation server, e.g. https://elucidate.example.org
  • target_uri – target URI to search for, e.g. IIIF Presentation API manifest or canvas URI
  • model – oa or w3c
Returns:

uri

gen_search_by_target_uri(target_uri: Optional[str], elucidate_base: str, model: str = 'w3c', field=None) → Optional[str][source]

Returns a search URI for searching Elucidate for a target using Elucidate’s basic search API.

This URI can be passed to other functions to return the result of the query.

Parameters:
  • model – oa or w3c, defaults to w3c.
  • elucidate_base – base URI for the annotation server, e.g. https://elucidate.example.org
  • target_uri – target URI to search for, e.g. a IIIF Presentatiion API canvas or manifest

URI :param field: list of fields to search on, defaults to both source and id :return: uri

get_items(uri: str) → Optional[dict][source]

Page through an ActivityStreams paged result set, yielding each page’s items one at a time.

Parameters:uri – Request URI, e.g. provided by gen_search_by_target_uri()
Returns:item
identify_target(annotation_content: dict) → Optional[str][source]

Identify the base level target for an annotation, for

output

If the annotation has multiple targets, return just base level target for the first.

Parameters:annotation_content – annotation dict
Returns:uri
iiif_batch_delete_by_manifest(manifest_uri: str, elucidate_uri: str, dry_run: bool = True) → bool[source]

Provides a IIIF aware wrapper around the _batch_delete_by_target_ function. Requests a IIIF Presentation API manifest and deletes all of the annotations with the canvas or the manifest URIs as their target.

Use Elucidate’s batch delete API to delete everything with a given target id or target source URI.

https://github.com/dlcs/elucidate-server/blob/master/USAGE.md#batch-delete

Parameters:
  • manifest_uri – URI of IIIF Presentation API manifest (must be de-referenceable)
  • elucidate_uri – base URI for Elucidate, e.g. https://elucidate.example.org
  • dry_run – if True, will not actually delete the content
Returns:

boolean for status, True if no errors, False if error on any delete operation.

iiif_iterative_delete_by_manifest(manifest_uri: str, elucidate_uri: str, method: str = 'search', dry_run: bool = True) → bool[source]

Provides a IIIF aware wrapper around the iterative_delete_by_target function.

Iteratively delete all annotations for every canvas in a IIIF Presentation manifest and for the IIIF Presentation API manifest itself.

Requests annotations either by container or by target URI and iteratively deletes the annotations by id, one at a time, using HTTP DELETE.

Does not use Elucidate’s batch delete APIs.

Parameters:
  • dry_run – if True, will not actually delete
  • method – identify the annotations to delete via container (hash) or search (Elucidate

query) :param manifest_uri: URI for IIIF Presentation API manifest. :param elucidate_uri: Elucidate base URI, e.g. https://elucidate.example.org :return: boolean success or fail

iiif_iterative_delete_by_manifest_async_get(manifest_uri: str, elucidate_uri: str, dry_run: bool = True) → bool[source]

Delete all annotations for every canvas in a IIIF manifest and for the manifest.

Uses asynchronous code to parallel get the search results to build the annotation list.

N.B. does NOT do an async DELETE. Delete is sequential.

Parameters:
  • dry_run – if True, will not actually delete, just prints URIs
  • manifest_uri – uri for IIIF manifest
  • elucidate_uri – Elucidate base uri
Returns:

boolean success or fail

item_ids(item: dict) → Optional[str][source]

Small helper function to yield identifier URI(s) for item from an Activity Streams item. Will yield both @id’ and ‘id’ values.

Parameters:item – Item from an activity streams page
Returns:uri
items_by_body_source(elucidate: str, topic: str, strict: bool = True) → dict[source]

Generator to yield annotations from query to Elucidate by body source.

For example, for a W3C web annotation, with body:

{"body": [
          {
            "type": "SpecificResource",
            "format": "application/html",
            "creator": "https://montague.example.org/",
            "generator": "https://montague.example.org//nlp/",
            "purpose": "tagging",
            "source": "https://www.example.org/themes/foo"
          }
      ]}

This function will query Elucidate for all annotations with body id or body source == “https://www.example.org/themes/foo”.

If strict = False, this would match both:

and

If strict = True, only annotations with an exact match on the body source will be returned.

Parameters:
Returns:

annotation dict

iterative_delete_by_target(target: str, elucidate_base: str, search_method: str = 'container', dryrun: bool = True) → bool[source]

Delete all annotations in a container for a target URI. Works by querying for the annotations and then iteratively deleting them one at a time.

Note, that this is _not_ an operation using Elucidate’s batch delete APIs.

Negative: could be slow, and involve many consecutive HTTP requests

Positive: as the code is handling the annotations one at a time, it will not time out with very large result sets.

The function can build the list of annotations to delete using either:

the Elucidate search by target API,

or a hash of the target URI to get a container URI.

N.B. choosing the container method assumes that container ID as an MD5 hash of the target URI.

Parameters:
  • dryrun – if True, will not actually delete, just logs and returns True (for success)
  • search_method – ‘container’ (hash of target URI) or ‘search’ (Elucidate query by target)
  • target – target URI
  • elucidate_base – base URI for Elucidate, e.g. https://elucidate.example.org
Returns:

boolean success or fail, True if no errors on _any_ request.

iterative_delete_by_target_async_get(target: str, elucidate_base: str, dryrun: bool = True) → bool[source]

Delete all annotations in a container for a target uri. Works by querying for the annotations and then iteratively deleting them one at a time. Not a bulk delete operation using Elucidate’s bulk APIs.

N.B. Negative: could be slow, and involve many HTTP requests, Positive: doesn’t really matter how big the result set is, it won’t time out, as handling the annotations one at a time.

Asynchronous query using the Elucidate search by target API to fetch the list of annotations to delete.

DELETE is not asychronous, but sequential.

Parameters:
  • dryrun – if True, will not actually delete, just logs and returns True (for success)
  • target – target uri
  • elucidate_base – base URI for Elucidate, e.g. https://elucidate.example.org
Returns:

boolean success or fail, True if no errors on _any_ request.

mirador_oa(w3c_body: dict) → dict[source]

Transform a single W3C Web Annotation Body (e.g. as produced by Montague) and returns formatted for Open Annotation in the Mirador client.

Parameters:w3c_body – annotation body
Returns:transformed annotation body
parent_from_annotation(content: dict) → Optional[str][source]

Parse W3C web annotation and attempt to yield URI for parent object the annotation target is part of.

A typical use would be to return the parent IIIF Presentation API manifest URI for an annotation on a IIIF Presentation API canvas or fragment of a canvas.

The code makes the assumption that, if passed a string for target, rather than an object, that manifest and canvas URI patterns follow the model used by the RESTful DLCS API model.

On this pattern, a canvas with URI:

will have a parent manifest with URI:

This assumption may not, and probably will not, hold for other sources.

If the annotation has a “dcterms:isPartOf” field within the target, the value of “dcterms:isPartOf” will be returned. If there are a list of annotation targets, the first parent will be returned.

Parameters:content – annotation object
Returns:target parent URI
parents_by_topic(elucidate: str, topic: str) → Optional[str][source]

Generator parses results from an Elucidate topic search request, and yields parent/manifest URIs.

The code makes the assumption that, if passed a string for target, rather than an object, that manifest and canvas URI patterns follow the model used by the RESTful DLCS API model.

On this pattern, a canvas with URI:

will have a parent manifest with URI:

This assumption may not, and probably will not, hold for other sources.

Parameters:
Returns:

manifest URI

read_anno(anno_uri: str) -> (typing.Union[str, NoneType], typing.Union[str, NoneType])[source]

GET an annotation from Elucidate, returns a tuple of annotation content and ETag

Parameters:anno_uri – URI for annotation
Returns:annotation content, etag
remove_keys(d: dict, keys: list) → dict[source]

Remove keys from a dictionary.

Parameters:
  • d – dict to edit
  • keys – list of keys to remove
Returns:

dict with keys removed

set_query_field(url: str, field: str, value: Union[int, str], replace: bool = False)[source]

Parse out the different parts of a URL, and optionally replace a query string parameter, before return the unparsed new URL.

Parameters:
  • url – URL to parse
  • field – field where the value should be replaced
  • value – replacement value
  • replace – boolean, if True, replace query string parameter
Returns:

unparsed URL

target_extract(json_dict: dict, fake_selector: bool = False) → Optional[str][source]

Extract the target and turn into a simple ‘on’.

Optionally, fake a selector, e.g. for whole canvas annotations, generate a target XYWH bounding box at top left.

Parameters:
  • fake_selector – if True, create a top left 50px box and associate with that.
  • json_dict – annotation content as dictionary
Returns:

string for the target URI

transform_annotation(item: dict, flatten_at_ids: bool = True, transform_function: Optional[Callable] = None) → Optional[dict][source]

Transform an annotation given an arbitrary function that is passed in.

For example, W3C to OA using “mirador_oa”.

The function will remove keys not used in the Open Annotation model.

If no transform_function is provided the annotation will be returned unaltered.

Parameters:
  • item – annotation
  • flatten_at_ids – if True replace @id dict with simple “@id” : “foo”
  • transform_function – function to pass the annotation through
Returns:

update_anno(anno_uri: str, anno_content: dict, etag: str, dry_run: bool = True) → int[source]

Update an individual annotation, requires etag.

Optionally, can be run as a dry run which will not update the annotation but will return a 200.

Parameters:
  • anno_uri – URI for annotation
  • anno_content – the annotation content
  • etag – ETag
  • dry_run – if True, log and return a 200
Returns:

return PUT request status code

uri_contract(uri: str) → Optional[str][source]

Contract a URI to just the schema, netloc, and path

For example, for:

Returns://example.org/foo
Return type:https
Parameters:uri – URI to contract
Returns:contracted URI

Module contents