Abusing XML and JSON
First off
JSON is just XML with less features and less syntax
Now, everyone knows that XML is awesome. The corollary is thus:
JSON is less awesome
Don’t use JSON.
Now that we’ve settled this, we can safely ignore the ongoing JSON-in-the-database-hype (which most of you will regret in five years anyway), and move on to the final example. How to do XML in the database.
This is what we want to do:
Given the original XML document, we want to parse that document, unnest the comma-separated list of films per actor, and produce a denormalised representation of actors/films in a single relation.
Ready. Set. Go. This is the idea. We have three CTE:
1
2
3
4
5
6
7
8
9
10
11
|
WITH RECURSIVE
x(v) AS ( SELECT '...' ::xml),
actors(
actor_id, first_name, last_name, films
) AS (...),
films(
actor_id, first_name, last_name,
film_id, film
) AS (...)
SELECT *
FROM films
|
In the first one, we simply parse the XML. Here with PostgreSQL:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
WITH RECURSIVE
x(v) AS ( SELECT '
<actors> <actor>
<first-name>Bud</first-name>
<last-name>Spencer</last-name>
<films>God Forgives... I Don’t, Double Trouble, They Call Him Bulldozer</films>
</actor>
<actor>
<first-name>Terence</first-name>
<last-name>Hill</last-name>
<films>God Forgives... I Don’t, Double Trouble, Lucky Luke</films>
</actor>
</actors>' ::xml),
actors(actor_id, first_name, last_name, films) AS (...),
films(actor_id, first_name, last_name, film_id, film) AS (...)
SELECT *
FROM films
|
Easy.
Then, we do some XPath magic to extract the individual values from the XML structure and put those into columns:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
WITH RECURSIVE
x(v) AS ( SELECT '...' ::xml),
actors(actor_id, first_name, last_name, films) AS (
SELECT
row_number() OVER (),
(xpath( '//first-name/text()' , t.v))[1]::TEXT,
(xpath( '//last-name/text()' , t.v))[1]::TEXT,
(xpath( '//films/text()' , t.v))[1]::TEXT
FROM unnest(xpath( '//actor' , ( SELECT v FROM x))) t(v)
),
films(actor_id, first_name, last_name, film_id, film) AS (...)
SELECT *
FROM films
|
Still easy.
Finally, just a bit of recursive regular expression pattern matching magic, and we’re done!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
WITH RECURSIVE
x(v) AS ( SELECT '...' ::xml),
actors(actor_id, first_name, last_name, films) AS (...),
films(actor_id, first_name, last_name, film_id, film) AS (
SELECT actor_id, first_name, last_name, 1,
regexp_replace(films, ',.+' , '' )
FROM actors
UNION ALL
SELECT actor_id, a.first_name, a.last_name, f.film_id + 1,
regexp_replace(a.films, '.*' || f.film || ', ?(.*?)(,.+)?' , '\1' )
FROM films AS f
JOIN actors AS a USING (actor_id)
WHERE a.films NOT LIKE '%' || f.film
)
SELECT *
FROM films
|
Let’s conclude:
- ۹۵/۰۳/۱۱