Story Notes - Technical Recipes

  • Preface
  • How To Read This Book (Do Not Be Afraid)
  • Invoking the Incantations Contained in This Book

Story Collection Scrapers and Search

  • A Full Text Searchable Database of Lang’s Fairy Books
  • Finding Book Index-Like Things In Lang’s Fairy Stories…
  • Annotating Lang’s Fairy Tales With Wikipedia Links
  • Identifying Common Refrains / Repeating Phrases In Lang’s Fairy Story Collections
  • Ashliman “Folklore and Mythology Electronic Texts” Scraper
  • Multilingual Folk Tale Database (MFTD)
  • Duncan Williamson Audio Recordings
  • Gibbs Tiny Tales Collection
  • World of Tales Online Story Collection

Notes & Queries

  • Exploring Notes & Queries
  • Generating a Full Text Searchable Database for Notes & Queries
  • Building a 19th c. Notes & Queries Full Text Search Engine
  • Creating a 19th c. Notes & Queries Index Database

Advanced Search Techniques

  • Doc2Vec Searching of Lang Database

Tale Types

  • Tale Types Scraper
  • Thompson Motif Index

Wikipedia and DBPedia

  • Aarne-Thompson-Uther (ATU) Search
Powered by Jupyter Book
  • Binder
  • RetroLite
  • .ipynb

Annotating Lang’s Fairy Tales With Wikipedia Links

Annotating Lang’s Fairy Tales With Wikipedia Links#

The Wikipedia page Lang's_Fairy_Books lists the contents of Lang’s coloured fairy books (as well as several other books), along with links to the Wikipedia page associated with each tale, if available.

This means we can have a go at annotating our database with Wikipedia links for each story. From those pages in turn, or associated DBpedia pages, we might also be able to extract Aarne-Thompson classification codes for the corresponding stories.

from sqlite_utils import Database

db_name = "lang_fairy_tale.db"
db = Database(db_name)
conn = db.conn

# Load in the sql magic
%load_ext sql
%sql sqlite:///$db_name

Load in the Wikipedia page that lists Lang’s Fairy Book collections and provides links to other WIkipedia pages associated with stories contained in them.

import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/Lang's_Fairy_Books"

html = requests.get(url)

We now make some lovely soup from the page that we can then start to fish entrails out of:

wp_soup = BeautifulSoup(html.content, "html.parser")
# Find the span for a particular book
wp_book_loc =  wp_soup.find("span", id="The_Blue_Fairy_Book_(1889)")

# Then navigate relative to this to get the (linked) story list
wp_book_stories = wp_book_loc.find_parent().find_next("ul").find_all('li')
wp_book_stories[:3]
[<li>"<a href="/wiki/The_Bronze_Ring" title="The Bronze Ring">The Bronze Ring</a>"</li>,
 <li>"<a href="/wiki/Prince_Hyacinth_and_the_Dear_Little_Princess" title="Prince Hyacinth and the Dear Little Princess">Prince Hyacinth and the Dear Little Princess</a>"</li>,
 <li>"<a href="/wiki/East_of_the_Sun_and_West_of_the_Moon" title="East of the Sun and West of the Moon">East of the Sun and West of the Moon</a>"</li>]

Get the Wikipedia path for stories with a Wikipedia page:

wp_book_paths = [(li.find("a").get("title"), li.find("a").get("href")) for li in wp_book_stories]

wp_book_paths[:3]
[('The Bronze Ring', '/wiki/The_Bronze_Ring'),
 ('Prince Hyacinth and the Dear Little Princess',
  '/wiki/Prince_Hyacinth_and_the_Dear_Little_Princess'),
 ('East of the Sun and West of the Moon',
  '/wiki/East_of_the_Sun_and_West_of_the_Moon')]

Useful as a list of dicts or pandas DataFrame?

import pandas as pd

wp_book_paths_wide = []

for item in wp_book_paths:
    wp_book_paths_wide.append( {"title":item[0].strip(), "path":item[1]} )
    
wp_book_df = pd.DataFrame(wp_book_paths_wide)
wp_book_df
title path
0 The Bronze Ring /wiki/The_Bronze_Ring
1 Prince Hyacinth and the Dear Little Princess /wiki/Prince_Hyacinth_and_the_Dear_Little_Prin...
2 East of the Sun and West of the Moon /wiki/East_of_the_Sun_and_West_of_the_Moon
3 The Yellow Dwarf /wiki/The_Yellow_Dwarf
4 Little Red Riding Hood /wiki/Little_Red_Riding_Hood
5 Sleeping Beauty /wiki/Sleeping_Beauty
6 Cinderella /wiki/Cinderella
7 Aladdin /wiki/Aladdin
8 The Story of the Youth Who Went Forth to Learn... /wiki/The_Story_of_the_Youth_Who_Went_Forth_to...
9 Rumpelstiltskin /wiki/Rumpelstiltskin
10 Beauty and the Beast /wiki/Beauty_and_the_Beast
11 The Master Maid /wiki/The_Master_Maid
12 Why the Sea Is Salt /wiki/Why_the_Sea_Is_Salt
13 Puss in Boots /wiki/Puss_in_Boots
14 Felicia and the Pot of Pinks /wiki/Felicia_and_the_Pot_of_Pinks
15 The White Cat (fairy tale) /wiki/The_White_Cat_(fairy_tale)
16 The Water-lily. The Gold-spinners /wiki/The_Water-lily._The_Gold-spinners
17 Perseus /wiki/Perseus
18 The Story of Pretty Goldilocks /wiki/The_Story_of_Pretty_Goldilocks
19 Dick Whittington /wiki/Dick_Whittington
20 The Wonderful Sheep /wiki/The_Wonderful_Sheep
21 Hop o' My Thumb /wiki/Hop_o%27_My_Thumb
22 Ali Baba /wiki/Ali_Baba
23 Hansel and Gretel /wiki/Hansel_and_Gretel
24 Snow-White and Rose-Red /wiki/Snow-White_and_Rose-Red
25 The Goose Girl /wiki/The_Goose_Girl
26 Diamonds and Toads /wiki/Diamonds_and_Toads
27 Prince Darling /wiki/Prince_Darling
28 Bluebeard /wiki/Bluebeard
29 Trusty John /wiki/Trusty_John
30 The Valiant Little Tailor /wiki/The_Valiant_Little_Tailor
31 Gulliver's Travels /wiki/Gulliver%27s_Travels#Part_I:_A_Voyage_to...
32 The Princess on the Glass Hill /wiki/The_Princess_on_the_Glass_Hill
33 Ahmed (Arabian Nights) /wiki/Ahmed_(Arabian_Nights)
34 Jack the Giant Killer /wiki/Jack_the_Giant_Killer
35 The Black Bull of Norroway /wiki/The_Black_Bull_of_Norroway
36 The Red Ettin /wiki/The_Red_Ettin

See if we can then cross reference these with stories in the database?

q = "SELECT book, title, chapter_order FROM books WHERE book='The Blue Fairy Book' ORDER BY chapter_order ASC"
df_blue = pd.read_sql(q, conn)

df_blue.head()
book title chapter_order
0 The Blue Fairy Book The Bronze Ring 0
1 The Blue Fairy Book Prince Hyacinth And The Dear Little Princess 1
2 The Blue Fairy Book East Of The Sun And West Of The Moon 2
3 The Blue Fairy Book The Yellow Dwarf 3
4 The Blue Fairy Book Little Red Riding Hood 4

Let’s see if the chapters align in terms of order as presented:

pd.DataFrame({"book":df_blue["title"], "wp":wp_book_df["title"], "wp_path":wp_book_df["path"]})
book wp wp_path
0 The Bronze Ring The Bronze Ring /wiki/The_Bronze_Ring
1 Prince Hyacinth And The Dear Little Princess Prince Hyacinth and the Dear Little Princess /wiki/Prince_Hyacinth_and_the_Dear_Little_Prin...
2 East Of The Sun And West Of The Moon East of the Sun and West of the Moon /wiki/East_of_the_Sun_and_West_of_the_Moon
3 The Yellow Dwarf The Yellow Dwarf /wiki/The_Yellow_Dwarf
4 Little Red Riding Hood Little Red Riding Hood /wiki/Little_Red_Riding_Hood
5 The Sleeping Beauty In The Wood Sleeping Beauty /wiki/Sleeping_Beauty
6 Cinderella, Or The Little Glass Slipper Cinderella /wiki/Cinderella
7 Aladdin And The Wonderful Lamp Aladdin /wiki/Aladdin
8 The Tale Of A Youth Who Set Out To Learn What ... The Story of the Youth Who Went Forth to Learn... /wiki/The_Story_of_the_Youth_Who_Went_Forth_to...
9 Rumpelstiltzkin Rumpelstiltskin /wiki/Rumpelstiltskin
10 Beauty And The Beast Beauty and the Beast /wiki/Beauty_and_the_Beast
11 The Master-Maid The Master Maid /wiki/The_Master_Maid
12 Why The Sea Is Salt Why the Sea Is Salt /wiki/Why_the_Sea_Is_Salt
13 The Master Cat; Or, Puss In Boots Puss in Boots /wiki/Puss_in_Boots
14 Felicia And The Pot Of Pinks Felicia and the Pot of Pinks /wiki/Felicia_and_the_Pot_of_Pinks
15 The White Cat The White Cat (fairy tale) /wiki/The_White_Cat_(fairy_tale)
16 The Water-Lily. The Gold-Spinners The Water-lily. The Gold-spinners /wiki/The_Water-lily._The_Gold-spinners
17 The Terrible Head Perseus /wiki/Perseus
18 The Story Of Pretty Goldilocks The Story of Pretty Goldilocks /wiki/The_Story_of_Pretty_Goldilocks
19 The History Of Whittington Dick Whittington /wiki/Dick_Whittington
20 The Wonderful Sheep The Wonderful Sheep /wiki/The_Wonderful_Sheep
21 Little Thumb Hop o' My Thumb /wiki/Hop_o%27_My_Thumb
22 The Forty Thieves Ali Baba /wiki/Ali_Baba
23 Hansel And Grettel Hansel and Gretel /wiki/Hansel_and_Gretel
24 Snow-White And Rose-Red Snow-White and Rose-Red /wiki/Snow-White_and_Rose-Red
25 The Goose-Girl The Goose Girl /wiki/The_Goose_Girl
26 Toads And Diamonds Diamonds and Toads /wiki/Diamonds_and_Toads
27 Prince Darling Prince Darling /wiki/Prince_Darling
28 Blue Beard Bluebeard /wiki/Bluebeard
29 Trusty John Trusty John /wiki/Trusty_John
30 The Brave Little Tailor The Valiant Little Tailor /wiki/The_Valiant_Little_Tailor
31 A Voyage To Lilliput Gulliver's Travels /wiki/Gulliver%27s_Travels#Part_I:_A_Voyage_to...
32 The Princess On The Glass Hill The Princess on the Glass Hill /wiki/The_Princess_on_the_Glass_Hill
33 The Story Of Prince Ahmed And The Fairy Paribanou Ahmed (Arabian Nights) /wiki/Ahmed_(Arabian_Nights)
34 The History Of Jack The Giant-Killer Jack the Giant Killer /wiki/Jack_the_Giant_Killer
35 The Black Bull Of Norroway The Black Bull of Norroway /wiki/The_Black_Bull_of_Norroway
36 The Red Etin The Red Ettin /wiki/The_Red_Ettin

Yes, they do so we can use that as a basis of a merge. That said, in the genral case it would probably also be useful to generate a fuzzy match score between matched titles with a report on any low scoring matches, just in case the alignment has gone awry.

# TO DO  - wp table for links, story and story order?
# TO DO fuzzy match score test just to check ingest and allow user to check poor matches

In passing,what if we wanted to try to match on the titles themselves?

If we use decased, but otherwise exact, matching, we see it’s bit flaky….

pd.merge(df_blue["title"], wp_book_df,
         left_on=df_blue["title"].str.lower(),
         right_on=wp_book_df["title"].str.lower(),
         how ="left" )
key_0 title_x title_y path
0 the bronze ring The Bronze Ring The Bronze Ring /wiki/The_Bronze_Ring
1 prince hyacinth and the dear little princess Prince Hyacinth And The Dear Little Princess Prince Hyacinth and the Dear Little Princess /wiki/Prince_Hyacinth_and_the_Dear_Little_Prin...
2 east of the sun and west of the moon East Of The Sun And West Of The Moon East of the Sun and West of the Moon /wiki/East_of_the_Sun_and_West_of_the_Moon
3 the yellow dwarf The Yellow Dwarf The Yellow Dwarf /wiki/The_Yellow_Dwarf
4 little red riding hood Little Red Riding Hood Little Red Riding Hood /wiki/Little_Red_Riding_Hood
5 the sleeping beauty in the wood The Sleeping Beauty In The Wood NaN NaN
6 cinderella, or the little glass slipper Cinderella, Or The Little Glass Slipper NaN NaN
7 aladdin and the wonderful lamp Aladdin And The Wonderful Lamp NaN NaN
8 the tale of a youth who set out to learn what ... The Tale Of A Youth Who Set Out To Learn What ... NaN NaN
9 rumpelstiltzkin Rumpelstiltzkin NaN NaN
10 beauty and the beast Beauty And The Beast Beauty and the Beast /wiki/Beauty_and_the_Beast
11 the master-maid The Master-Maid NaN NaN
12 why the sea is salt Why The Sea Is Salt Why the Sea Is Salt /wiki/Why_the_Sea_Is_Salt
13 the master cat; or, puss in boots The Master Cat; Or, Puss In Boots NaN NaN
14 felicia and the pot of pinks Felicia And The Pot Of Pinks Felicia and the Pot of Pinks /wiki/Felicia_and_the_Pot_of_Pinks
15 the white cat The White Cat NaN NaN
16 the water-lily. the gold-spinners The Water-Lily. The Gold-Spinners The Water-lily. The Gold-spinners /wiki/The_Water-lily._The_Gold-spinners
17 the terrible head The Terrible Head NaN NaN
18 the story of pretty goldilocks The Story Of Pretty Goldilocks The Story of Pretty Goldilocks /wiki/The_Story_of_Pretty_Goldilocks
19 the history of whittington The History Of Whittington NaN NaN
20 the wonderful sheep The Wonderful Sheep The Wonderful Sheep /wiki/The_Wonderful_Sheep
21 little thumb Little Thumb NaN NaN
22 the forty thieves The Forty Thieves NaN NaN
23 hansel and grettel Hansel And Grettel NaN NaN
24 snow-white and rose-red Snow-White And Rose-Red Snow-White and Rose-Red /wiki/Snow-White_and_Rose-Red
25 the goose-girl The Goose-Girl NaN NaN
26 toads and diamonds Toads And Diamonds NaN NaN
27 prince darling Prince Darling Prince Darling /wiki/Prince_Darling
28 blue beard Blue Beard NaN NaN
29 trusty john Trusty John Trusty John /wiki/Trusty_John
30 the brave little tailor The Brave Little Tailor NaN NaN
31 a voyage to lilliput A Voyage To Lilliput NaN NaN
32 the princess on the glass hill The Princess On The Glass Hill The Princess on the Glass Hill /wiki/The_Princess_on_the_Glass_Hill
33 the story of prince ahmed and the fairy paribanou The Story Of Prince Ahmed And The Fairy Paribanou NaN NaN
34 the history of jack the giant-killer The History Of Jack The Giant-Killer NaN NaN
35 the black bull of norroway The Black Bull Of Norroway The Black Bull of Norroway /wiki/The_Black_Bull_of_Norroway
36 the red etin The Red Etin NaN NaN

A fuzzy match might be able to improve things…

# Reused from on https://stackoverflow.com/a/56315491/454773
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

def fuzzy_merge(df_1, df_2, key1, key2, threshold=90, limit=2):
    """
    :param df_1: the left table to join
    :param df_2: the right table to join
    :param key1: key column of the left table
    :param key2: key column of the right table
    :param threshold: how close the matches should be to return a match, based on Levenshtein distance
    :param limit: the amount of matches that will get returned, these are sorted high to low
    :return: dataframe with boths keys and matches
    """
    s = df_2[key2].tolist()
    
    m = df_1[key1].apply(lambda x: process.extract(x, s, limit=limit))  
    df_1['matches'] = m
    
    m2 = df_1['matches'].apply(lambda x: ', '.join([i[0] for i in x if i[1] >= threshold]))
    df_1['matches'] = m2
    return df_1
fuzzy_merge(df_blue, wp_book_df, "title", "title", 88, limit=1)[["title", "matches"]]
title matches
0 The Bronze Ring The Bronze Ring
1 Prince Hyacinth And The Dear Little Princess Prince Hyacinth and the Dear Little Princess
2 East Of The Sun And West Of The Moon East of the Sun and West of the Moon
3 The Yellow Dwarf The Yellow Dwarf
4 Little Red Riding Hood Little Red Riding Hood
5 The Sleeping Beauty In The Wood Sleeping Beauty
6 Cinderella, Or The Little Glass Slipper Cinderella
7 Aladdin And The Wonderful Lamp Aladdin
8 The Tale Of A Youth Who Set Out To Learn What ...
9 Rumpelstiltzkin Rumpelstiltskin
10 Beauty And The Beast Beauty and the Beast
11 The Master-Maid The Master Maid
12 Why The Sea Is Salt Why the Sea Is Salt
13 The Master Cat; Or, Puss In Boots Puss in Boots
14 Felicia And The Pot Of Pinks Felicia and the Pot of Pinks
15 The White Cat The White Cat (fairy tale)
16 The Water-Lily. The Gold-Spinners The Water-lily. The Gold-spinners
17 The Terrible Head
18 The Story Of Pretty Goldilocks The Story of Pretty Goldilocks
19 The History Of Whittington
20 The Wonderful Sheep The Wonderful Sheep
21 Little Thumb
22 The Forty Thieves
23 Hansel And Grettel Hansel and Gretel
24 Snow-White And Rose-Red Snow-White and Rose-Red
25 The Goose-Girl The Goose Girl
26 Toads And Diamonds Diamonds and Toads
27 Prince Darling Prince Darling
28 Blue Beard Bluebeard
29 Trusty John Trusty John
30 The Brave Little Tailor
31 A Voyage To Lilliput
32 The Princess On The Glass Hill The Princess on the Glass Hill
33 The Story Of Prince Ahmed And The Fairy Paribanou
34 The History Of Jack The Giant-Killer Jack the Giant Killer
35 The Black Bull Of Norroway The Black Bull of Norroway
36 The Red Etin The Red Ettin
#https://github.com/jsoma/fuzzy_pandas/

# This is probably overkill...
#%pip install fuzzy_pandas
import fuzzy_pandas as fpd

fpd.fuzzy_merge(df_blue[["title"]], wp_book_df,
            left_on='title',
            right_on='title',
            ignore_case=True,
            ignore_nonalpha=True,
            method='jaro', #bilenko, levenshtein, metaphone, jaro
            threshold=0.86, # If we move to 0.86 wee get a false positive...
            keep_left='all',
            keep_right="all"
               )
title title path
0 The Bronze Ring The Bronze Ring /wiki/The_Bronze_Ring
1 Prince Hyacinth And The Dear Little Princess Prince Hyacinth and the Dear Little Princess /wiki/Prince_Hyacinth_and_the_Dear_Little_Prin...
2 East Of The Sun And West Of The Moon East of the Sun and West of the Moon /wiki/East_of_the_Sun_and_West_of_the_Moon
3 The Yellow Dwarf The Yellow Dwarf /wiki/The_Yellow_Dwarf
4 Little Red Riding Hood Little Red Riding Hood /wiki/Little_Red_Riding_Hood
5 Cinderella, Or The Little Glass Slipper Cinderella /wiki/Cinderella
6 Rumpelstiltzkin Rumpelstiltskin /wiki/Rumpelstiltskin
7 Beauty And The Beast Beauty and the Beast /wiki/Beauty_and_the_Beast
8 The Master-Maid The Master Maid /wiki/The_Master_Maid
9 Why The Sea Is Salt Why the Sea Is Salt /wiki/Why_the_Sea_Is_Salt
10 Felicia And The Pot Of Pinks Felicia and the Pot of Pinks /wiki/Felicia_and_the_Pot_of_Pinks
11 The White Cat The White Cat (fairy tale) /wiki/The_White_Cat_(fairy_tale)
12 The Water-Lily. The Gold-Spinners The Water-lily. The Gold-spinners /wiki/The_Water-lily._The_Gold-spinners
13 The Story Of Pretty Goldilocks The Story of Pretty Goldilocks /wiki/The_Story_of_Pretty_Goldilocks
14 The Wonderful Sheep The Wonderful Sheep /wiki/The_Wonderful_Sheep
15 Hansel And Grettel Hansel and Gretel /wiki/Hansel_and_Gretel
16 Snow-White And Rose-Red Snow-White and Rose-Red /wiki/Snow-White_and_Rose-Red
17 The Goose-Girl The Goose Girl /wiki/The_Goose_Girl
18 Prince Darling Prince Darling /wiki/Prince_Darling
19 Blue Beard Bluebeard /wiki/Bluebeard
20 Trusty John Trusty John /wiki/Trusty_John
21 The Princess On The Glass Hill The Princess on the Glass Hill /wiki/The_Princess_on_the_Glass_Hill
22 The Black Bull Of Norroway The Black Bull of Norroway /wiki/The_Black_Bull_of_Norroway
23 The Red Etin The Red Ettin /wiki/The_Red_Ettin
fpd.fuzzy_merge(df_blue[["title"]], wp_book_df,
            left_on='title',
            right_on='title',
            ignore_case=True,
            ignore_nonalpha=True,
            method='metaphone', #levenshtein, metaphone, jaro, bilenko
            threshold=0.86,
            keep_left='all',
            keep_right="all"
               )
title title path
0 The Bronze Ring The Bronze Ring /wiki/The_Bronze_Ring
1 Prince Hyacinth And The Dear Little Princess Prince Hyacinth and the Dear Little Princess /wiki/Prince_Hyacinth_and_the_Dear_Little_Prin...
2 East Of The Sun And West Of The Moon East of the Sun and West of the Moon /wiki/East_of_the_Sun_and_West_of_the_Moon
3 The Yellow Dwarf The Yellow Dwarf /wiki/The_Yellow_Dwarf
4 Little Red Riding Hood Little Red Riding Hood /wiki/Little_Red_Riding_Hood
5 Rumpelstiltzkin Rumpelstiltskin /wiki/Rumpelstiltskin
6 Beauty And The Beast Beauty and the Beast /wiki/Beauty_and_the_Beast
7 The Master-Maid The Master Maid /wiki/The_Master_Maid
8 Why The Sea Is Salt Why the Sea Is Salt /wiki/Why_the_Sea_Is_Salt
9 Felicia And The Pot Of Pinks Felicia and the Pot of Pinks /wiki/Felicia_and_the_Pot_of_Pinks
10 The Water-Lily. The Gold-Spinners The Water-lily. The Gold-spinners /wiki/The_Water-lily._The_Gold-spinners
11 The Story Of Pretty Goldilocks The Story of Pretty Goldilocks /wiki/The_Story_of_Pretty_Goldilocks
12 The Wonderful Sheep The Wonderful Sheep /wiki/The_Wonderful_Sheep
13 Hansel And Grettel Hansel and Gretel /wiki/Hansel_and_Gretel
14 Snow-White And Rose-Red Snow-White and Rose-Red /wiki/Snow-White_and_Rose-Red
15 The Goose-Girl The Goose Girl /wiki/The_Goose_Girl
16 Prince Darling Prince Darling /wiki/Prince_Darling
17 Blue Beard Bluebeard /wiki/Bluebeard
18 Trusty John Trusty John /wiki/Trusty_John
19 The Princess On The Glass Hill The Princess on the Glass Hill /wiki/The_Princess_on_the_Glass_Hill
20 The Black Bull Of Norroway The Black Bull of Norroway /wiki/The_Black_Bull_of_Norroway
21 The Red Etin The Red Ettin /wiki/The_Red_Ettin

## Other Things to Link In

Have other people generated data sets that can be linked in?

  • http://www.mythfolklore.net/andrewlang/indexbib.htm /via @OnlineCrsLady

previous

Finding Book Index-Like Things In Lang’s Fairy Stories…

next

Identifying Common Refrains / Repeating Phrases In Lang’s Fairy Story Collections

By Tony Hirst
© Copyright 2022.