Story Notes - Technical Recipes

  • Preface
  • How To Read This Book (Do Not Be Afraid)
  • Invoking the Incantations Contained in This Book

Story Collection Scrapers and Search

  • A Full Text Searchable Database of Lang’s Fairy Books
  • Finding Book Index-Like Things In Lang’s Fairy Stories…
  • Annotating Lang’s Fairy Tales With Wikipedia Links
  • Identifying Common Refrains / Repeating Phrases In Lang’s Fairy Story Collections
  • Ashliman “Folklore and Mythology Electronic Texts” Scraper
  • Multilingual Folk Tale Database (MFTD)
  • Duncan Williamson Audio Recordings
  • Gibbs Tiny Tales Collection
  • World of Tales Online Story Collection

Notes & Queries

  • Exploring Notes & Queries
  • Generating a Full Text Searchable Database for Notes & Queries
  • Building a 19th c. Notes & Queries Full Text Search Engine
  • Creating a 19th c. Notes & Queries Index Database

Advanced Search Techniques

  • Doc2Vec Searching of Lang Database

Tale Types

  • Tale Types Scraper
  • Thompson Motif Index

Wikipedia and DBPedia

  • Aarne-Thompson-Uther (ATU) Search
Powered by Jupyter Book
  • Binder
  • RetroLite
  • .ipynb
Contents
  • Folk Songs

Aarne-Thompson-Uther (ATU) Search

Contents

  • Folk Songs

Aarne-Thompson-Uther (ATU) Search#

Linked data search over ATU tagged resources.

Wikipedia page: https://en.wikipedia.org/wiki/Godfather_Death

DBpedia page: https://live.dbpedia.org/page/Godfather_Death

Gives us eg:

  • https://live.dbpedia.org/page/Category:The_Devil_in_fairy_tales (rdf:type skos:Concept ; rdfs:label The Devil in fairy tales (en) )

  • dbp:aarneThompsonGrouping ATU 332 (en); dbp:country Germany (en) ; dbp:folkTaleName Godfather Death (en) ; dct:subject dbc:Grimms’_Fairy_Tales dbc:The_Devil_in_fairy_tales ; rdfs:label Godfather Death (en)

https://dbpedia.org/snorql/?query= SELECT+DISTINCT+COUNT(*)++WHERE+{ ++?story+dbp:aarneThompsonGrouping +?atu+. ?story+gold:hypernym dbr:Tale+. ++?story+rdfs:label+?story_name+. FILTER+(langMatches(lang(?story_name),+”en”)) }+

%%capture
#Install some essential packages
%pip install SPARQLWrapper pandas folium
# Import the necessary packages
from SPARQLWrapper import SPARQLWrapper, JSON

# Add some helper functions

# A function that will return the results of running a SPARQL query with 
# a defined set of prefixes over a specified endpoint.
# It follows the same five-step process apart from creating the query, which 
# is provided as an argument to the function.
def runQuery(endpoint, prefix, q):
    ''' Run a SPARQL query with a declared prefix over a specified endpoint '''
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(prefix+q) # concatenate the strings representing the prefixes and the query
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()
    
# Import pandas to provide facilities for creating a DataFrame to hold results
import pandas as pd

# Function to convert query results into a DataFrame
# The results are assumed to be in JSON format and therefore the Python dictionary will have  
# the results indexed by 'results' and then 'bindings'. 
def dict2df(results):
    ''' A function to flatten the SPARQL query results and return the column values '''
    data = []
    for result in results["results"]["bindings"]:
        tmp = {}
        for el in result:
            tmp[el] = result[el]['value']
        data.append(tmp)

    df = pd.DataFrame(data)
    return df

# Function to run a query and return results in a DataFrame
def dfResults(endpoint, prefix, q):
    ''' Generate a data frame containing the results of running
        a SPARQL query with a declared prefix over a specified endpoint '''
    return dict2df(runQuery(endpoint, prefix, q))
        
# Print a limited number of results of a query
def printQuery(results, limit=''):
    ''' Print the results from the SPARQL query '''
    resdata = results["results"]["bindings"]
    if limit != '':
        resdata = results["results"]["bindings"][:limit]
    for result in resdata:
        for ans in result:
            print('{0}: {1}'.format(ans, result[ans]['value']))
        print()

# Run a query and print out a limited number of results
def printRunQuery(endpoint, prefix, q, limit=''):
    ''' Print the results from the SPARQL query '''
    results = runQuery(endpoint, prefix, q)
    printQuery(results, limit)
# Define any prefixes
prefix = '''
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX dbpedia: <http://dbpedia.org/resource/>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX dct: <http://purl.org/dc/terms/>
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX dbc: <http://dbpedia.org/resource/Category:>
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    PREFIX prov: <http://www.w3.org/ns/prov#>
    PREFIX dbp: <https://www.w3.org/1999/02/22-rdf-syntax-ns#Property>
    
    PREFIX ouseful:<http://ouseful.info/>
'''
#Declare the DBPedia endpoint
endpoint="http://dbpedia.org/sparql"
sparql = SPARQLWrapper(endpoint)
q = '''
SELECT DISTINCT ?story_name ?src WHERE {
  ?story dct:subject dbc:The_Devil_in_fairy_tales .
  ?story rdfs:label ?story_name .
  ?story prov:wasDerivedFrom ?src .
FILTER (langMatches(lang(?story_name), "en"))
}
LIMIT 10
'''
df = dfResults(endpoint, prefix, q)
df
story_name src
0 The Girl Without Hands http://en.wikipedia.org/wiki/The_Girl_Without_...
1 Godfather Death http://en.wikipedia.org/wiki/Godfather_Death?o...
2 Jack the Giant Killer http://en.wikipedia.org/wiki/Jack_the_Giant_Ki...
3 The Snow Queen http://en.wikipedia.org/wiki/The_Snow_Queen?ol...
4 Errementari http://en.wikipedia.org/wiki/Errementari?oldid...
5 How the Devil Married Three Sisters http://en.wikipedia.org/wiki/How_the_Devil_Mar...
6 Why the Sea is Salt http://en.wikipedia.org/wiki/Why_the_Sea_is_Sa...
7 Jean, the Soldier, and Eulalie, the Devil's Da... http://en.wikipedia.org/wiki/Jean,_the_Soldier...
8 Little Johnny Sheep-Dung http://en.wikipedia.org/wiki/Little_Johnny_She...
9 The Lost Children (fairy tale) http://en.wikipedia.org/wiki/The_Lost_Children...
q = '''

SELECT DISTINCT COUNT(*) AS ?count WHERE {
  ?story dbp:aarneThompsonGrouping ?atu .
  ?story rdfs:label ?story_name .
FILTER (langMatches(lang(?story_name), "en"))
} 

'''

df = dfResults(endpoint, prefix, q)
df
count
0 0

Folk Songs#

eg https://dbpedia.org/page/The_Raggle_Taggle_Gypsy from https://en.wikipedia.org/wiki/The_Raggle_Taggle_Gypsy

gold:hypernym dbr:Song PREFIX gold: http://linguistics-ontology.org/gold/hypernym

This song has a Roud number, but there is no Roud number attribute; it’s also a Chold Ballad, but there is no Child ballad number attribute

ALso wikidata: http://localhost:8888/notebooks/Documents/GitHub/lang-fairy-books/lang-fairy-books-db.ipynb which does have eg roud number

https://query.wikidata.org/##Rock bands that start with “M” SELECT DISTINCT * WHERE { ?item rdfs:label “Godfather Death”@en . ?item wdt:P136 ?z . ?z rdfs:label ?type . FILTER(LANG(?type) = “en”) . } LIMIT 300

previous

Thompson Motif Index

By Tony Hirst
© Copyright 2022.