*

You can cut and paste the code on this page and test it on the incline interpreter.

Queries

CDuce: documentation: Tutorial: Queries

Previous page: References Next page: Higher-order functions

Select from where

CDuce is endowed with a select_from_where syntax to perform some SQL-like queries. The general form of select expressions is

select e from
   p1 in e1,
   p2 in e2,
       :
   pn in en
where b

where e is an expression b a boolean expression, the pi's are patterns, and the ei's are sequence expressions.

The select_from_where construction is translated into:

transform e1 with p1 -> 
   transform e2 with p2 -> 
         ...
       transform en with pn -> 
          if b then  [e] else []

XPath-like expressions

XPath-like expressions are of two kind : e/t , e/@a , and e//t where e is an expression, t a type, and a an attribute.

They are syntactic sugar for :

 flatten(select x from <_>[(x::t | _ )*] in e) 

and

select x from <_ a=x>_ in  e
Examples

Types and data for the examples

Let us consider the following types representing a Bibliography

type Biblio  = <bibliography>[Heading Paper*]
type Heading = <heading>[ PCDATA ]
type Paper   = <paper>[ Author+ Title Conference File ]
type Author  = <author>[ PCDATA ]
type Title   = <title>[ PCDATA ]
type Conference = <conference>[ PCDATA ]
type File    = <file>[ PCDATA ]

and some values

let bib : Biblio = 
  <bibliography>[
    <heading>"Alain Frisch's bibliography"
    <paper>[
      <author>"Alain Frisch"
      <author>"Giuseppe Castagna"
      <author>"Veronique Benzaken"
      <title>"Semantic subtyping"
      <conference>"LICS 02"
      <file>"semsub.ps.gz"
    ]
    <paper>[
      <author>"Mariangiola Dezani-Ciancaglini"
      <author>"Alain Frisch"
      <author>"Elio Giovannetti"
      <author>"Yoko Motohama"
      <title>"The Relevance of Semantic Subtyping"
      <conference>"ITRS'02"
      <file>"itrs02.ps.gz"
    ]
    <paper>[
      <author>"Veronique Benzaken"
      <author>"Giuseppe Castagna"
      <author>"Alain Frisch"
      <title>"CDuce: a white-paper"
      <conference>"PLANX-02"
      <file>"planx.ps.gz"
    ]
 ]

Projections

All titles in the bibliography bib

let titles = [bib]/<paper>_/<title>_

Which yields to:

val titles : [ <title>[ Char* ]* ] = [ <title>[ 'Semantic subtyping' ]
  <title>[ 'The Relevance of Semantic Subtyping' ]
  <title>[ 'CDuce: a white-paper' ]
  ]
Ok.

All authors in the bibliography bib

let authors = [bib]/<paper>_/<author>_

Yielding the result:

val authors : [ <author>[ Char* ]* ] = [ <author>[ 'Alain Frisch' ]
  <author>[ 'Giuseppe Castagna' ]
  <author>[ 'Veronique Benzaken' ]
  <author>[ 'Mariangiola Dezani-Ciancaglini' ]
  <author>[ 'Alain Frisch' ]
  <author>[ 'Elio Giovannetti' ]
  <author>[ 'Yoko Motohama' ]
  <author>[ 'Veronique Benzaken' ]
  <author>[ 'Giuseppe Castagna' ]
  <author>[ 'Alain Frisch' ]
  ]
Ok.

All papers in the bibliography bib

let papers = [bib]/<paper>_

Yielding:

val papers : [ <paper>[ Author+ Title Conference File ]* ] = [ <paper>[
    <author>[ 'Alain Frisch' ]
    <author>[ 'Giuseppe Castagna' ]
    <author>[ 'Veronique Benzaken' ]
    <title>[ 'Semantic subtyping' ]
    <conference>[ 'LICS 02' ]
    <file>[ 'semsub.ps.gz' ]
    ]
  <paper>[
    <author>[ 'Mariangiola Dezani-Ciancaglini' ]
    <author>[ 'Alain Frisch' ]
    <author>[ 'Elio Giovannetti' ]
    <author>[ 'Yoko Motohama' ]
    <title>[ 'The Relevance of Semantic Subtyping' ]
    <conference>[ 'ITRS\'02' ]
    <file>[ 'itrs02.ps.gz' ]
    ]
  <paper>[
    <author>[ 'Veronique Benzaken' ]
    <author>[ 'Giuseppe Castagna' ]
    <author>[ 'Alain Frisch' ]
    <title>[ 'CDuce: a white-paper' ]
    <conference>[ 'PLANX-02' ]
    <file>[ 'planx.ps.gz' ]
    ]
  ]
Ok.

Select_from_where

The same queries we wrote above can of course be programmed with the select_from_where construction

All the titles

let tquery = select y 
             from x in [bib]/<paper>_ ,
                  y in [x]/<title>_

This query is programmed in a XQuery-like style largely relying on the projections. Note that x and y are CDuce's patterns. The result is:

 
val tquery : [ <title>[ Char* ]* ] = [ <title>[ 'Semantic subtyping' ]
  <title>[ 'The Relevance of Semantic Subtyping' ]
  <title>[ 'CDuce: a white-paper' ]
  ]

Now let's program the same query with the translation given previously thus eliminating the y variable

let withouty = flatten(select [x] from x in [bib]/<paper>_/<title>_)

Yielding:


val withouty : [ <title>[ Char* ]* ] = [ <title>[ 'Semantic subtyping' ]
  <title>[ 'The Relevance of Semantic Subtyping' ]
  <title>[ 'CDuce: a white-paper' ]
  ]
- : [ <title>[ Char* ]* ] = [ <title>[ 'The Relevance of Semantic Subtyping' ] ]
- : [ <title>[ Char* ]* ] = [ <title>[ 'The Relevance of Semantic Subtyping' ] ]

Ok.

But the select_from_where expressions are likely to be used for more complex queries such as the one that selects all titles whose at least one author is "Alain Frisch" or "Veronique Benzaken"

let sel = select y 
          from x in [bib]/<paper>_ ,
               y in [x]/<title>_,
               z in [x]/<author>_
where z = <author>"Alain Frisch" or z = <author>"Veronique Benzaken"

Which yields:

val sel : [ <title>[ Char* ]* ] = [ <title>[ 'Semantic subtyping' ]
  <title>[ 'Semantic subtyping' ]
  <title>[ 'The Relevance of Semantic Subtyping' ]
  <title>[ 'CDuce: a white-paper' ]
  <title>[ 'CDuce: a white-paper' ]
  ]

Ok.

Note that the corresponding semantics, as in SQL, is a multiset one. Thus duplicates are not eliminated. To discard them, one has to use the distinct_values operator.

A pure pattern example

This example computes the same result as the previous query except that duplicates are eliminated. It is written in a pure pattern form (i.e., without any XPath-like projections)

let sel = select t
from <_>[(x::<paper>_ | _ )*] in [bib],
     <_>[ _* (<author>"Alain Frisch" | <author>"Veronique Benzaken")  _*  (t&<title>_ ); _]  in x  


Note the pattern on the second line in the from clause. As the type of an element in x is <paper>[ Author+ Title Conference File], we skip the tag : <_>, then we skip authors _* until we find either Alain Frisch or Veronique Benzaken (<author>"Alain Frisch" | <author>"Veronique Benzaken"), then we skip the remaining authors _*, we then capture the corresponding title (t &<title>_) and then ignore the tail of the sequence by writing ; _

Result:

val sel : [ <title>[ Char* ]* ] = [ <title>[ 'Semantic subtyping' ]
  <title>[ 'The Relevance of Semantic Subtyping' ]
  <title>[ 'CDuce: a white-paper' ]
  ]

Ok.

This pure pattern form of the query yields (in general) better performance than the same one written in an XQuery-like programming style. However, the query optimiser automatically translates the latter into a pure pattern one

Joins

This example is the exact transcription of query Q5 of XQuery use cases. We first give the corresponding CDuce types. We leave the user in charge of creating the corresponding relevant values.

type Bib = <bib>[Book*] 
type Book = <book year=String>[Title (Author+ | Editor+ ) Publisher Price]
type Author = <author>[Last First]
type Editor = <editor>[Last First Affiliation]
type Title  = <title>[PCDATA]
type Last  = <last>[PCDATA]
type First  = <first>[PCDATA]
type Affiliation = <affiliation>[PCDATA]
type Publisher  = <publisher>[PCDATA]
type Price  = <price>[PCDATA]

The queries are expressed first in an XQuery-like style, then in a pure pattern style: the first pattern-based query is the one produced by the automatic translation from the first one. The last query correponds to a pattern aware programmer's version.

XQuery style

<books-with-prices>
select <book-with-price>[t1 
              <price-amazon>([p2]/_) <price-bn>([p1]/_)]
from b in [biblio]/Book ,
     t1 in [b]/Title,
     e in [amazon]/Entry,
     t2 in [e]/Title,
     p2 in [e]/Price,
     p1 in [b]/Price
 where t1=t2 

Automatic translation of the previous query into a pure pattern (thus more efficient) one

<books-with-prices>
select <book-with-price>[t1 <price-amazon>x11 <price-bn>x10 ]
from <_>[(x3::Book|_)*] in [biblio],
     <_>[(x9::Price|x5::Title|_)*] in x3,
     t1 in x5,
     <_>[(x6::Entry|_)*] in [amazon],
     <_>[(x7::Title|x8::Price|_)*] in x6,
     t2 in x7,
     <_>[(x10::_)*] in x9,
     <_>[(x11::_)*] in x8
 where t1=t2

Pattern aware programmer's version of the same query (hence hand optimised). This version of the query is very efficient. Be aware of patterns.

<books-with-prices>
select <book-with-price>[t2 <price-amazon>p2 <price-bn>p1]
from <bib>[b::Book*] in [biblio],
     <book>[t1&Title _* <price>p1] in b,
     <reviews>[e::Entry*] in [amazon],
     <entry>[t2&Title <price>p2 ;_] in e
where t1=t2

More complex Queries: on the power of patterns

<bib>
    select <book (a)> x
      from <book (a)>[ (x::(Any\Editor)|_ )* ] in bib 

This expression returns all book in bib but removoing the editor element. If one wants to write more explicitly:

    select <book (a)> x
      from <book (a)>[ (x::(Any\<editor>_)|_ )* ] in bib

Or even:

    select <book (a)> x
      from <book (a)>[ (x::(<(_\`editor)>_)|_ )* ] in bib

Back to the first one:

<bib>
    select <book (a)> x
      from <(book) (a)>[ (x::(Any\Editor)|_ )* ] in bib

This query takes any element in bib, tranforms it in a book element and removes sub-elements editor (but you will get a warning as capture variable book in the from is never used.

    select <(book) (a)> x
      from <(book) (a)>[ (x::(Any\Editor)|_ )* ] in bib
]]
</sample>
<p> Same thing but without tranforming tag to  "book".
More interestingly:</p>
<sample><![CDATA[
    select <(b) (a\id)> x
      from <(b) (a)>[ (x::(Any\Editor)|_ )* ] in bib

removes all "id" attribute (if any) from the attributes of the element in bib.


    select <(b) (a\id+{bing=a.id})> x
      from <(b) (a)>[ (x::(Any\Editor)|_ )* ] in bib

Changes attribute id=x into bing=x However, one must be shure that each element in bib has an "id" attribute if such is not the case the expression is ill-typed. If one wants to perform this only for those elements which certainly have an "id" attribute then:

    select <(b) (a\id+{bing=a.id})> x
      from <(b) (a&{id=_}))>[ (x::(Any\Editor)|_ )* ] in bib

An unorthodox query: Formatted table generation

The following program generates a 10x10 multiplication table:

let bg ((Int , Int) -> String) 
  (y, x) -> if (x mod 2 + y mod 2 <= 0) then "lightgreen" 
            else if (y mod 2 <= 0) then "yellow"
	    else if (x mod 2 <= 0) then "lightblue"
	    else "white";;

<table border="1">
  select <tr> select <td align="right" style=("background:"@bg(x,y)) >[ (x*y) ]
              from y in [1 2 3 4 5 6 7 8 9 10] : [1--10*] 
  from x in [1 2 3 4 5 6 7 8 9 10] : [1--10*];;
 

The result is the xhtml code that generates the following table:

12345678910
2468101214161820
36912151821242730
481216202428323640
5101520253035404550
6121824303642485460
7142128354249566370
8162432404856647280
9182736455463728190
102030405060708090100

Site map

CDuce: documentation: Tutorial: Queries

Previous page: References Next page: Higher-order functions