Aarne-Thompson-Uther (ATU) Search
Contents
Aarne-Thompson-Uther (ATU) Search#
Linked data search over ATU tagged resources.
Wikipedia page: https://en.wikipedia.org/wiki/Godfather_Death
DBpedia page: https://live.dbpedia.org/page/Godfather_Death
Gives us eg:
https://live.dbpedia.org/page/Category:The_Devil_in_fairy_tales (rdf:type skos:Concept ; rdfs:label The Devil in fairy tales (en) )
dbp:aarneThompsonGrouping ATU 332 (en); dbp:country Germany (en) ; dbp:folkTaleName Godfather Death (en) ; dct:subject dbc:Grimms’_Fairy_Tales dbc:The_Devil_in_fairy_tales ; rdfs:label Godfather Death (en)
%%capture
#Install some essential packages
%pip install SPARQLWrapper pandas folium
# Import the necessary packages
from SPARQLWrapper import SPARQLWrapper, JSON
# Add some helper functions
# A function that will return the results of running a SPARQL query with
# a defined set of prefixes over a specified endpoint.
# It follows the same five-step process apart from creating the query, which
# is provided as an argument to the function.
def runQuery(endpoint, prefix, q):
''' Run a SPARQL query with a declared prefix over a specified endpoint '''
sparql = SPARQLWrapper(endpoint)
sparql.setQuery(prefix+q) # concatenate the strings representing the prefixes and the query
sparql.setReturnFormat(JSON)
return sparql.query().convert()
# Import pandas to provide facilities for creating a DataFrame to hold results
import pandas as pd
# Function to convert query results into a DataFrame
# The results are assumed to be in JSON format and therefore the Python dictionary will have
# the results indexed by 'results' and then 'bindings'.
def dict2df(results):
''' A function to flatten the SPARQL query results and return the column values '''
data = []
for result in results["results"]["bindings"]:
tmp = {}
for el in result:
tmp[el] = result[el]['value']
data.append(tmp)
df = pd.DataFrame(data)
return df
# Function to run a query and return results in a DataFrame
def dfResults(endpoint, prefix, q):
''' Generate a data frame containing the results of running
a SPARQL query with a declared prefix over a specified endpoint '''
return dict2df(runQuery(endpoint, prefix, q))
# Print a limited number of results of a query
def printQuery(results, limit=''):
''' Print the results from the SPARQL query '''
resdata = results["results"]["bindings"]
if limit != '':
resdata = results["results"]["bindings"][:limit]
for result in resdata:
for ans in result:
print('{0}: {1}'.format(ans, result[ans]['value']))
print()
# Run a query and print out a limited number of results
def printRunQuery(endpoint, prefix, q, limit=''):
''' Print the results from the SPARQL query '''
results = runQuery(endpoint, prefix, q)
printQuery(results, limit)
# Define any prefixes
prefix = '''
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbc: <http://dbpedia.org/resource/Category:>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX dbp: <https://www.w3.org/1999/02/22-rdf-syntax-ns#Property>
PREFIX ouseful:<http://ouseful.info/>
'''
#Declare the DBPedia endpoint
endpoint="http://dbpedia.org/sparql"
sparql = SPARQLWrapper(endpoint)
q = '''
SELECT DISTINCT ?story_name ?src WHERE {
?story dct:subject dbc:The_Devil_in_fairy_tales .
?story rdfs:label ?story_name .
?story prov:wasDerivedFrom ?src .
FILTER (langMatches(lang(?story_name), "en"))
}
LIMIT 10
'''
df = dfResults(endpoint, prefix, q)
df
story_name | src | |
---|---|---|
0 | The Girl Without Hands | http://en.wikipedia.org/wiki/The_Girl_Without_... |
1 | Godfather Death | http://en.wikipedia.org/wiki/Godfather_Death?o... |
2 | Jack the Giant Killer | http://en.wikipedia.org/wiki/Jack_the_Giant_Ki... |
3 | The Snow Queen | http://en.wikipedia.org/wiki/The_Snow_Queen?ol... |
4 | Errementari | http://en.wikipedia.org/wiki/Errementari?oldid... |
5 | How the Devil Married Three Sisters | http://en.wikipedia.org/wiki/How_the_Devil_Mar... |
6 | Why the Sea is Salt | http://en.wikipedia.org/wiki/Why_the_Sea_is_Sa... |
7 | Jean, the Soldier, and Eulalie, the Devil's Da... | http://en.wikipedia.org/wiki/Jean,_the_Soldier... |
8 | Little Johnny Sheep-Dung | http://en.wikipedia.org/wiki/Little_Johnny_She... |
9 | The Lost Children (fairy tale) | http://en.wikipedia.org/wiki/The_Lost_Children... |
q = '''
SELECT DISTINCT COUNT(*) AS ?count WHERE {
?story dbp:aarneThompsonGrouping ?atu .
?story rdfs:label ?story_name .
FILTER (langMatches(lang(?story_name), "en"))
}
'''
df = dfResults(endpoint, prefix, q)
df
count | |
---|---|
0 | 0 |
Folk Songs#
eg https://dbpedia.org/page/The_Raggle_Taggle_Gypsy from https://en.wikipedia.org/wiki/The_Raggle_Taggle_Gypsy
gold:hypernym dbr:Song PREFIX gold: http://linguistics-ontology.org/gold/hypernym
This song has a Roud number, but there is no Roud number attribute; it’s also a Chold Ballad, but there is no Child ballad number attribute
ALso wikidata: http://localhost:8888/notebooks/Documents/GitHub/lang-fairy-books/lang-fairy-books-db.ipynb which does have eg roud number