COMPROBACIÓN DE ASERTOS EN BASES
DE DATOS RELACIONALES

MIHÁLY PALENIK

GRADO EN INGENIERÍA INFORMÁTICA
FACULTAD DE INFORMÁTICA

DEPARTAMENTO DE SISTEMAS INFORMÁTICOS Y
PROGRAMACIÓN

Universidad Complutense de Madrid

TRABAJO FIN DE GRADO

February 1, 2017

Director:
Rafael Caballero Roldán


Contents

Abstract i

Resumen ii

1 Introduction 1

2 Relational algebra 5
2.0.1 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.0.2 Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.0.3 Extended relational algebra . . . . . . . . . . . . . . . . . 6
2.0.4 Additional operation in extended relational algebra . . . . 6

3 Assertion language grammar 8

4 Formal definition of assertions 10
4.0.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.0.2 Element checking . . . . . . . . . . . . . . . . . . . . . . . 14

5 Implementation details 15
5.0.1 Assertions interpreted in SQL query . . . . . . . . . . . . 16
5.0.2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 User documentation 21
6.0.1 Config file . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

7 Working process 24

8 Conclusions and Future Work 25

9 Bibliography 27


Abstract

In this work we consider the problem of detecting errors in large sets of SQL
relations. In order to detect possible bugs the user can introduce assertions using
a simple, set-like language indicating properties like inclusion or membership.
Then, the system checks these assertions, reporting to the user if any assertion
violation is detected. The assertions include options that allow the system to
consider relations both as sets and as multisets and also to take the tuple order
into account. These options can be included by the user at the same time the
assertions are defined. We present a working prototype developing these ideas.

Keywords: relational databases, postgresql, assertions, debugging,
relational algebra, Java, ANTLR, multisets, testing, SQL views

i


Resumen

En este trabajo consideramos el problema de la detección de errores en grandes
conjuntos de relaciones SQL. Para facilitar la detección de errores, el usuario
puede introducir asertos utilizando un lenguaje sencillo con notación conjun-
tista, que permite definir propiedades como la inclusión o pertenencia. El sis-
tema comprueba estos asertos e informa al usuario si se detecta que algún aserto
no se verifica. Los asertos incluyen opciones que permiten al sistema considerar
las relaciones tanto como conjuntos como si se tratara de multiconjuntos, a la
vez que se permite tener en cuenta el orden entre las tuplas. Estas opciones
se especifican por el usuario a la vez que se definen los asertos. Presentamos
también un prototipo que desarrolla estas ideas.

Palabras clave: bases de datos relacionales, postgresql, asertos,
depuración, álgebra relacional, Java, ANTLR, multiconjuntos, prue-
bas, vistas SQL

ii


Chapter 1

Introduction

Nowadays databases are huge, it is hard to understand the whole one especially
in case of lot of views. Unfortunately developers sometimes make mistakes that
cause wrong queries. This kind of problems are hard to detect, mainly when
the original tables and views contain plenty of data. But what can developers
do? How can errors are detected? It may happen, that developers realize the
generated view is wrong, but the query which is generate this view is certainly
good, then what is wrong? Views, which are part of the query, what generate
the view? A lot of debugging and time are required to find the error source.

Is there some way to make debugging easier? In this work we propose to use
assertions. Sometimes we do not know that two views are correct or not but
know relation between them. For example, we consider two view: countries in
Europe and countries in Eurasia. It is trivial that European countries are also
Eurasian countries. Therefore, we can introduce an assertion representing this
inclusion relation. If the assertion fails, then we find the symptom of a bug, and
we can start to analyze views employed in the assertion.

Assertions can be checked in a dynamic way. For instance, it is possible that
some views are incorrect but the particular instance of data do not show this
mistake. However, if we modify the original tables by adding more data, it is
possible that the error shows up. Thus assertions should not be considered a
static property that can be checked just the first time, but properties that can
be checked each time the database instance change.

As formal language supporting our assertions we consider relational algebra,
representing assertions as first order formulas over set predicates such as inclu-
sion and constants like the empty set. In particular we employ the extended
relational algebra with its well-founded semantics to model the database asser-
tions. The difference between relational algebra and extended relational algebra
is that while in the former relations are considered sets, the extended version

1


2 CHAPTER 1. INTRODUCTION

works with multisets and includes more operations for queries. Our assertion
language is based on set operations of relational algebra.

In our setting, assertion can take two different forms. The first one involves
two relations and one operator:

• <= : right inclusion

• < : right strict inclusion

• => : left inclusion

• > : left strict inclusion

• = : equality

• <> : inequality

Our previous example can be easily express with this form. Assume Eu-
ropean countries and Eurasian countries has eur countries and eas countries
names in database. Then assertion will be:

eur countries <= eas countries

Originally, we can think of this tables like sets, but it can be add rep argument
to switch to the multiset interpretation. These arguments are included between
square brackets in this form. Another one is order which means set or multiset
has ordering. This is usually defined by ORDER BY or storage of table.

The second form of assertions is element checking, with the form

(A1, A2, . . . , An) in Relation

where (A1, A2, . . . , An) is a tuple representation of one row in a relation. Mean-
ing of this is identical with element checking of set.

Let us see an example. Suppose that we have a countries table with some
country entries, and two user views are created: ord pop ms which is mid-
dle sized countries (between 200000 and 700000 km2) ordered by area and
ord countries which is countries table ordered by area. These are possible
instances of these relations:

name continent area population is monarchy
’Hungary’ ’Europe’ 93036 9893899 false

’Spain’ ’Europe’ 504782 46449565 true
’UK’ ’Europe’ 244820 65102385 true
’Italy’ ’Europe’ 301318 60233948 false

’Croatia’ ’Europe’ 56594 4284889 false
’Poland’ ’Europe’ 312679 38483957 false


3

’Estonia’ ’Europe’ 45226 1313271 false
’Finland’ ’Europe’ 338145 5501043 false
’Mexico’ ’America’ 1972550 120286655 false

’Venezuela’ ’America’ 916445 30405207 false
’Canada’ ’America’ 9984670 35702707 true

’Columbia’ ’America’ 1141748 47704427 false
’Brazil’ ’America’ 8547404 200361925 false
’Chile’ ’America’ 756950 17619708 false

’Uruguay’ ’America’ 176215 3407062 false
’Guatemala’ ’America’ 108890 15468203 false

’Japan’ ’Asia’ 377915 127110047 true
’Nepal’ ’Asia’ 147181 30986975 false

’Bhutan’ ’Asia’ 47000 753947 true
’India’ ’Asia’ 3287263 1263200000 false

’Mongolia’ ’Asia’ 1564116 2953190 false
’Ghana’ ’Africa’ 238540 25904598 false

’Zimbabwe’ ’Africa’ 390757 14149648 false
’Zambia’ ’Africa’ 752614 14309466 false

Table 1.1: countries table

name continent area population is monarchy
Ghana Africa 238540 25904598 f

UK Europe 244820 65102385 t
Italy Europe 301318 60233948 f

Poland Europe 312679 38483957 f
Finland Europe 338145 5501043 f
Japan Asia 377915 127110047 t

Zimbabwe Africa 390757 14149648 f
Spain Europe 504782 46449565 t

Table 1.2: ord pop ms view

name continent area population is monarchy
Estonia Europe 45226 1313271 f
Bhutan Asia 47000 753947 t
Croatia Europe 56594 4284889 f
Hungary Europe 93036 9893899 f

Guatemala America 108890 15468203 f
Nepal Asia 147181 30986975 f

Uruguay America 176215 3407062 f
Ghana Africa 238540 25904598 f

UK Europe 244820 65102385 t
Italy Europe 301318 60233948 f


4 CHAPTER 1. INTRODUCTION

Poland Europe 312679 38483957 f
Finland Europe 338145 5501043 f
Japan Asia 377915 127110047 t

Zimbabwe Africa 390757 14149648 f
Spain Europe 504782 46449565 t

Zambia Africa 752614 14309466 f
Chile America 756950 17619708 f

Venezuela America 916445 30405207 f
Columbia America 1141748 47704427 f
Mongolia Asia 1564116 2953190 f
Mexiko America 1972550 120286655 f
India Asia 3287263 1263200000 f
Brasil America 8547404 200361925 f

Canada America 9984670 35702707 t

Table 1.3: ord countries view

Now let us define our assertion.

ord pop ms <= ord countries [ord]

It is easy to check that this assertion is true. But what if we introduce
countries table in place of ord countries? The assertion will fail because of ord
argument. This is really an inclusion but it has different ordering. You can see
in coutries table the entry of UK is earlier appear than entry of Ghana, which
is the first row of ord pop ms. But without ord argument the assertion is true.


Chapter 2

Relational algebra

The theoretical bases of assertion is extended relational algebra. In this work
almost all assertions are expressed in this setting because it is easy to make
connection between relational algebra and SQL queries. Before we define the
exact meaning of each assertion, let us see relational algebra. As I mentioned it
is an algebra with well-founded semantics to be modeling database’s query. All
relation is set of tuples. Relational algebra defines operations on this sets.

2.0.1 Operations

Set operations Because relation is set we can use set operation which is:

• ∪ : union

• ∩ : intersection

• \ : difference

Projection Assume we have a relation scheme R(A1, . . . , AN ) and we want
to see just specific column. In this case we can use projection (π). The result
relation has same number of row like the original one but just with the specific
column. For example if we just want to see 2nd and 5th column we can write
πA2,A5

(R).

Selection Selection (σcond) is an operation of selection based on a condition.
In the condition logical operation can be used. For example consider relation
with Countries(name, continent) scheme and collect European countries. The
result expression is σcontinent=′Europe′(Countries).

Rename When we use some of the above operations then it is possible to add
name for the created relation with rename (ρname). Or we can modify whole
scheme of relation also with form of ρname(A1,...,AN ).

5


6 CHAPTER 2. RELATIONAL ALGEBRA

2.0.2 Joins

Cartesian product This (×) is also a set operation between two sets. First,
take the columns of the first relation and after take the second one. Column
number of result relation is the sum of the columns in the two relations. For
example we have R1(A1, . . . , AN ) and R2(B1, . . . , BN ) schemes then R1 × R2

will be (A1, . . . , AN , B1, . . . , BN ). Columns name have an additional prefix –
relation name – with dot, if there are some collision between columns name.

Natural join It (./) can be useful when two relation schemes have one or
more columns with the same name. In that case the generated relation has all
columns from original relations, but columns with the same name appear just
once. Rows, where values of these columns are equal, are merged into one row.
Assume R1(A1, A2, B1) and R2(B1, B2) relation schemes, then R1 ./ R2 will
has (A1, A2, B1, B2) scheme, and rows are the merge of two rows where data
from B1 is equal in the R1 and R2.

Theta join Theta join (./θ) is similar like Cartesian product but it has a
selection which is defined in θ part. If R1 and R2 a relation then theta join is
R1 ./θ R2 = σθ(R1 ×R2).

2.0.3 Extended relational algebra

Extended relational algebra is exactly the same like relational algebra except it
works with multisets. It is like set but it allows to have repetition. The above
operators and joins work in a similar way as I am defined but it has additional
operation.

2.0.4 Additional operation in extended relational algebra

Grouping Grouping (γcolumns) can be useful when we have multiple row
which are equal. In columns part you can enumerate columns, and grouping is
based on these ones.

Ordering You can define ordering with τcolumns. It is based on columns which
is enumerated in columns part.

Distinct With distinct operator (δ) multisets can be mapped to set. It just
drops the duplicates. This is the relation between relational algebra and ex-
tended relational algebra.

Aggregate functions Aggregate function can be added in a condition or
columns part. It is used with grouping. There are five ones and all has one
parameter which is a column name.

• AVG : average of the column’s value


7

• SUM : sum of the column’s value

• MIN : minimum value of the column

• MAX : maximum value of the column

• COUNT : counts values of the column


Chapter 3

Assertion language
grammar

In the introduction I added an informal definition of assertion language gram-
mar. I use BNF form which is a notation technique for context-free grammar,
which is usually used to define syntax for programming languages.

For implementation I use ANTLRv4 as you will see in implementation details.
It can be easily defined your language syntax with that tool, and generate source
code to handle the process of syntax tree building.

It can be seen below the BNF form of that grammar.

〈assertion〉 ::= 〈relation〉 〈operator〉 〈relation〉 ’[’ 〈arguments〉 ’]’
| ’(’ 〈row〉 ’)’ ’in’ 〈relation〉

〈arguments〉 ::= 〈arguments〉 ’,’ 〈arguments〉
| ’order’
| ’rep’

〈row〉 ::= 〈row〉 ’,’ 〈row〉
| 〈bool〉
| 〈string〉
| 〈int〉

〈bool〉 ::= ’true’
| ’false’

8


9

〈string〉 ::= ”’ ([a-z]|[A-Z]|[0-9])* ”’

〈int〉 ::= [0-9]+

〈relation〉 ::= ([a-z]|[A-Z]|’ ’)[’.’|’ ’|’$’|0-9|a-z|A-Z]*

〈operator〉 ::= ’=’
| ’<>’
| ’<=’
| ’=>’
| ’<’
| ’>’

The structure of assertion grammar is easy now. But it can be extend. The
first rule assertion contains all possible form of assertions, which are now two.
But if it is necessary to extend with more operator, it just have to be added to
OPERATOR rule.


Chapter 4

Formal definition of
assertions

For a better implementation it is important to define well the meaning of as-
sertions. It is not enough informal explanation because that causes a lot of
misunderstanding. In the next subsections you get to know all formal definition
of assertion which is in relational algebra form.

4.0.1 Operators

In this section all assertion without arguments (order, rep) are defined with ex-
tended relational algebra. This algebra is well defined and it is possible to check
SQL query – implementation – and assertion equivalency. Extended relational
algebra works with multisets, that means database’s relations have duplicates.
First inclusion is considered. It is equivalent with inclusion of set. Because of
this we must eliminate duplicates, and have to define inclusion with relational
algebra.

A ⊆ B ≡ A \B = ∅

This can be expressed in relational algebra.

Definition (Inclusion).

R1 <= R2
def
= δ(R1) \ δ(R2) = ∅

In the left side the inclusion operator (<=) is seen, and in the right side it
is the definition of inclusion which is given by extended relational algebra. δ is
necessary because of conversion multiset to set. All rest of assertion is based on
inclusion assertion. Consider the equality, in mathematical sense is expressed
in two way of inclusion and this is exactly the equality assertion.

10


11

Definition (Equality).

R1 = R2
def
= R1 < = R2 ∧R2 <= R1 =

δ(R1) ⊆ δ(R2) ∧ δ(R1) ⊇ δ(R2)

Inequality checking is opposite meaning of equality. In the right-hand side
the equality is an assertion which easily can be traced back to mathematical
definition.

Definition (Inequality).

R1 <> R2
def
= ¬(R1 = R2)

Strict inclusion is similar to inclusion but the two set must not be equal.

Definition (Strict inclusion).

R1 < R2
def
= R1 <> R2 ∧R1 <= R2

The rep argument

In grammar’s argument part it can be chosen rep argument which is abbrevia-
tion of repetition. In general the meaning is the repetition takes into considera-
tion. This part is where nature of multiset are used. Multiset inclusion – signed
with ⊆m – is also expressed with multiset difference, and connection with set
functions have been already defined.

A ⊆m B ≡ A \m B = ∅

In below definition of inclusion the right-hand side is an extended relational
algebra expression, not an assertion.

Definition (Assertions with rep argument).

R1 <= R2[rep]
def
= R1 \R2 = ∅

R1 = R2[rep]
def
= R1 <= R2[rep] ∧R2 <= R1[rep]

R1 <> R2[rep]
def
= ¬(R1 = R2[rep])

R1 < R2[rep]
def
= R1 <> R2[rep] ∧R1 <= R2[rep]

The order argument

Next argument is order. Unfortunately extended relational algebra does not
have ordering ([Wid09]). That is why added an extra column - called #O - to
all relation which is number of sequence from one to the number of rows. These
numbers simulate row ordering which is equal with the row number in a real
database. Thus originalR(A1, . . . , AN ) relation is consideredR(#O,A1, . . . , AN ).
Because of lack of rep argument, it should be avoid the duplicates from the re-
lation and keep ordering.


12 CHAPTER 4. FORMAL DEFINITION OF ASSERTIONS

δo function To reach the required relation we have to use δo function which
keeps just the first appearance of element.

Definition.

δo(R(A1, . . . , An)) = γCOUNT (#lMin)→#O,A1,...,AN
(Prec)

where Prec is

Prec(#lMin,#gMin,A1, . . . , AN ) :=

ρFirstsNum(π#min(Firsts)) ./Firsts.#min≥FirstsNum.#min Firsts

where Firsts is

Firsts(#min,A1, . . . , AN ) = γMIN(#O),A1,...,An
(R).

F irsts is the table which contains all first appearance of different rows with
original row number, and Prec contains all data from Firsts and it is paired
with the row number of previous elements and itself. You can see an example
below which give you a better understanding. Consider this relation:

#O A1 A2 A3

1 A A B
2 B A B
3 A A B
4 B A B
5 C C B
6 C C B
7 A A B
8 D D D
9 C C B

Firsts is:

#min A1 A2 A3

1 A A B
2 B A B
5 C C B
8 D D D

The left table of Prec – FirstsNum – is:

#min
1
2
5
8

And after theta join with Firsts table the Prec is:


13

#lMin #gMin A1 A2 A3

1 1 A A B
1 2 B A B
2 2 B A B
1 5 C C B
2 5 C C B
5 5 C C B
1 8 D D D
2 8 D D D
5 8 D D D
8 8 D D D

And finally the δo:

#O A1 A2 A3

1 A A B
2 B A B
3 C C B
4 D D D

Now assertions can be defined. Be careful because now R1 and R2 is a
relation with additional #O column. That is why projection used in assertion
definition.

Definition (Assertions with order argument).

R1 <= R2[order]
def
= πA1,...,AN

(R1) <= πA1,...,AN
(R2) ∧ Cond

R1 = R2[order]
def
= R1 <= R2[order] ∧R2 <= R1[order]

R1 <> R2[order]
def
= ¬(R1 = R2[order])

R1 < R2[order]
def
= R1 <> R2[order] ∧R1 <= R2[order]

where Cond is:

σR1.#O>R2.#O(δo(R1) ./ δo(R2)) = ∅

order and rep at the same time

That is the hardest part of this section. First, I tried to express these options
also in extended relational algebra. After trying in many different ways, finally
I realized that the problem was the expressivity of relational algebra. In this
formal language, you can only express things in a global – I mean table – scope.
But these options, introduce locality which cannot be expressed with relational
algebra. This is not surprising because relational algebra is not Turing complete.
I have not proved formally that expressing these operations in relations algebra
is not possible. Thus, this remains an open question. However, in this work I
decided to give up this way. I think that even if the relational algebra expression
for these operations exist, they would be too complex and inefficient.


14 CHAPTER 4. FORMAL DEFINITION OF ASSERTIONS

Definition (Inclusion assertion with order and rep arguments). Consider two
relations:

R1(#O,A1, . . . , An), R2(#O,A1, . . . , AN ).

Then R1 <= R2[rep, order] assertion is true if

{(1, a11 , . . . , a1n), . . . , (Y, aY1 , . . . , aYn)} ∈ R1

and
{(1, b11 , . . . , a1n), . . . , (Z, bZ1 , . . . , aZn)} ∈ R2

where
Y = |R1| ∧ Z = |R2| ∧ Y ≤ Z.

Then

∀(i, ai1 , . . . , ain)(i ∈ [2..n− 1]) : ∃(j, bj1 , . . . , bjn) : jprev < j < jnext

where jprev is the value which belongs to i − 1 and jnext is the value which
belongs to i + 1. In the border is same except in case of i = 1 the condition is
j < jnext, and in case of i = n is jprev < j. The one element inclusion is trivial.

The definition may be a little hard. But think about regular expressions. If
a1, . . . , an are the rows ofR1 then the inclusion operator holds if ∗, a1, ∗, a2, . . . , ∗, an, ∗
are the rows in R2.

Now, it is easy to define the rest of the assertion.

Definition.

R1 = R2[rep, order]
def
= R1 <= R2[rep, order] ∧R2 <= R1[rep, order]

R1 <> R2[rep, order]
def
= ¬(R1 = R2[rep, order])

R1 < R2[rep, order]
def
= R1 <> R2[rep, order] ∧R1 <= R2[rep, order]

4.0.2 Element checking

The membership assertion follows a different syntax. It is defined by a tuple,
which represents one row of relation, and the relation where the tuple should
be.

Definition. Assume we have a relation with scheme R(A1, . . . , Am). Then

(a1, . . . , an) in R

assertion is true if

n = m ∧ ∃(b1, . . . , bm) ∈ R : ∀i ∈ [1..n] : ai = bi


Chapter 5

Implementation details

In implementation part of assertion a lot of tools are used. The implementation
language is Java. I did not need to create a separate program for assertions,
because this is a part of a bigger project. Therefore, I started from a framework
(called SBuggy) where I could add my implementation.

SBuggy is a debugging tool for database views. With that program, the user
can mark which views are incorrect, and SBuggy finds the related views or
tables which occur in the view query. If the tool finds a bug in the selected
view query, then the debugger finishes. However, if the query is correct, the
debugger continues debugging the previously marked views or tables. In this
project we add assertions as a new feature that improves that program ability
to help developer during debugging.

The tool ANTLRv4 is used for the grammar definition and translation into
Java. It is easy to write language syntax with that tool. The user can write
it in BNF form. ANTLR generates code which builds up a syntax tree. The
programmers can add their own part for this generated code e.g. it is possible
to add a particular error handling method or include additional information
during the tree traversal phase.

SBuggy already has graphical user interface, so I have a base to put my user
interface. It is created with Swing, which is a graphical user interface widget
toolkit for Java.

The assertions are translated into SQL queries. These queries are executed
from Java by using the JDBC interface. JDBC is an API that allows Java to
connect to external databases, and run SQL queries. The database connection
was already implemented in SBuggy, so I just had to use that connection.

15


16 CHAPTER 5. IMPLEMENTATION DETAILS

The relational database management system employed in this project is post-
greSQL, which is a free software. I did not use any special syntax in the queries,
therefore, it is not important what dialect of SQL is used. The advantage of this
is that our tool could easily work with other database systems such as MySQL
or Oracle.

5.0.1 Assertions interpreted in SQL query

In this subsection I show the equivalent SQL query for each assertion. This
is straightforward since there is a well-known connection between relational
algebra and standard SQL.

Inclusion

Because every assertion is expressed with inclusion assertion, that is why I just
have to create equivalency in that part. We assume that the mentioned relations
are stored in a database as a table or view.

Without arguments First, consider inclusion assertion without argument.
SQL has set operation like union, intersection or difference. That is why is easy
to express that part.

R1 <= R2

SELECT ∗ FROM R1
EXCEPT

SELECT ∗ FROM R2

This SQL query is exactly the same like difference in case of sets. If the
result is an empty relation then inclusion assertion is true otherwise is false.
You can easily see I tried to approximate to the mathematical definition.

rep argument Here we have to take into account repetition. Fortunately,
relational database tables already resemble multisets. And the set operations
in SQL have syntax for multiset treatment, just adding the ALL keyword.

R1 <= R2 [rep]

SELECT ∗ FROM R1
EXCEPT ALL

SELECT ∗ FROM R2

After this, the tool needs to check the result, like in the previous SQL query.
If the result is empty then assertion is true, and otherwise it is false.


17

order argument Checking assertions including the order option in SQL is
more complex. Thus, I split the code in small parts, like in the relational algebra
definition. If originally the table or view does not include an ordering section,
then ordering is the storage of rows in database. Unfortunately, if GROUP BY
is used in query then it is not guaranteed that it keeps ordering. Because of this,
I use the trick like in relational algebra to add a new column, which contains
the number from one to the last row number, and consider this the ordering.
Consider a table or view R1(A1, . . . , AN). Then, we can add that plus column
with the below SQL query:

Add ordering to R1

SELECT row number ( ) over ( ) , [ R1 columns ]
FROM R1

Result is R1, with the additional row number columns. Square brackets are
used to marks the part where all columns name are enumerated. This table can
be used to define SQL query for δo(R1) relation.

Query to create δo(R1)

SELECT row number ( ) over ( order by MIN( row number ) ) ,
[ R1 columns ]

FROM [ r e l a t i o n with a d d i t i o n a l row number ] AS tab l ey
GROUP BY [ R1 columns ] ;

This query does exactly what we define in relational algebra. With post-
greSQL we can define not just a new columns with number but we can define
how we want order. This syntax is same with ORDER BY syntax. It can be
seen I added alias for the result, it is necessary because of syntax reason. For
simplicity above query gets a name, which is reducedRelation(R1), and has one
parameter, which is the table or view name, and I will use it in my SQL query.
Think about that like a macro.

R1 <= R2 [order]

SELECT [ l e f t y . A1 , . . . , l e f t y .AN]
FROM reducedRelat ion (R1) AS l e f t y ,

reducedRelat ion (R2) AS r i gh ty
WHERE [ l e f t y . A1=r i gh ty . A1 AND . . . AND l e f t y .AN=r igh ty .AN]

AND l e f t y . row number>r i gh ty . row number

Above query is the SQL query of inclusion assertion with order. In the
last row it does exactly what is the selection part of the definition in relational
algebra. Below you can see the whole query.


18 CHAPTER 5. IMPLEMENTATION DETAILS

R1 <= R2 [order]

SELECT [ l e f t y . A1 , . . . , l e f t y .AN] FROM
(SELECT row number ( ) over ( order by MIN( row number ) ) ,

[ R1 columns ]
FROM

(SELECT row number ( ) over ( ) , ∗ FROM R1) AS tab l ey
GROUP BY [ R1 columns ] )

) AS l e f t y ,
(SELECT row number ( ) over ( order by MIN( row number ) ) ,

[ R2 columns ]
FROM

(SELECT row number ( ) over ( ) , ∗ FROM R2) AS tab l ey
GROUP BY [ R2 columns ] ) )

AS r i gh ty
WHERE [ l e f t y . A1=r i gh ty . A1 AND . . . AND l e f t y .AN=r igh ty .AN]

AND l e f t y . row number>r i gh ty . row number

But before run that query like in the definition, we have to check first the
original inclusion without arguments R1 <= R2. If it fails it is not important
to check the second query.

order and rep argument Unfortunately I could not add definition in rela-
tional algebra in that case. So implementation is based on regular expression
definition.

R1 <= R2 [order, rep]

while (R2 . next ( ) && ! R1 . i s A f t e r L a s t ( ) ) {
i f ( i sEqua l (R1 , R2) ) {

R1 . next ( ) ;
}

}

Here R1 and R2 are two ResultSet which contain the original R1 and R2
table. isEqual function is responsible for two rows equality checking.

Other assertions

As other assertions with operator (<,=, <>) are defined in relational algebra,
as it is implemented in Java. All assertion, which are not inclusion, first they
use inclusion and after other checking. For example in equality checking, first
see right inclusion and after left inclusion. This is similar in inequality, because
if equality is true then inequality is false, and as we know equality use inclusion.

Element checking This was an easy part to create SQL query from assertion.
In a tuple, which represent one row in a table, it can be written three type of


19

value, which are integer, boolean value and string between apostrophe. This is
just a trivial checking, which uses all value in WHERE clause.

5.0.2 Test

Program has an SQL script – called test db.sql – which contains SQL statements
to create test database. It can be run with psql -f [filename] command. This
script assumes, we have already a test schema in postgreSQL.

Script has create statements of one table and 13 view. The table name is
countries and has countries with its data, like area or population. It has 24
rows.

Create test.countries table

CREATE TABLE t e s t . c o u n t r i e s (
name VARCHAR( 20 ) ,
cont inent VARCHAR( 20 ) ,
area INT,
populat ion INT,
is monarchy BOOLEAN

) ;

I try to create test database, where all test have a little sense, not just
some random value. This goal sometimes is achieved sometimes not, because
of complexity of assertions. Below you can see a little explanation for all view
with name:

• eu and mon: European monarchy

• middle sized : Countries with middle sized area [200000 - 700000]

• density : Population density

• dense continents: Countries with density>100

• continents: All continents

• ord middle sized : middle sized ordered by area

• ord countries: Countries ordered by area

• ord pop ms: middle sized ordered by pop

• u ms mon: Unio of middle sized and all monarchy

• twice: Countries Descartes

• countries pop: Order by population


20 CHAPTER 5. IMPLEMENTATION DETAILS

• union pop: Twice middle sized minus eu and mon ordered by popula-
tion

• union 3 pop: Three times middle sized ordered by population

Above views and table are the test database. In [project folder]/asserts
folder you can see two files. test.txt contains all assertion, which are based
on the test database. For example test.eu and mon < test.middle sized,
which means European monarchies, is included by middle sized countries in the
database (between 200000 and 700000 km2).

Expected output is in the same directory in the out.txt file. The above
assertion is passed so in this file in the same row like assertion has an OK.
Here are other example from this file nonsense < othernonsense. Because
test database does not have nonsense and othernonsense, table SBuggy got an
exception from SQL connection. This appear in out file with ERROR.


Chapter 6

User documentation

The above detailed implementation is part of the SBuggy software. This pro-
gram has a GUI so I had a skeleton to insert my part. It appears in a different
tab.

Figure 6.1: SBuggy

On the above picture you can see the Assertions tab. In the bottom corner
there are three button Add, Remove and Check. With Add you can add one
assertion to the table, which is in the middle of the tab. Easily can remove
selected lines with Remove button. Multiselection also works. If you click on
the Check button, then SBuggy start to check all assertion. If an assertion is
already checked it is not problem, because program check that again. You can
modify an existing assertion with double click on the row.

21


22 CHAPTER 6. USER DOCUMENTATION

The bottom right corner contains the Load and Save buttons. The assertions
can be saved and loaded, using a simple text file as output format. When we
load assertions or add them directly through program, the syntax checker runs,
displaying an error message if a syntax error is found.

All error appear in the Log tab. The type of error are:

• lexical or syntactical

• failing assertion

• database errors

In the middle table you can see assertions in the first column, and the state
of assertion checking in the second column. The states are:

• NOT CHECKED: assertion does not checked

• WRONG: assertion fails

• OK: assertion succeed

• ERROR: sign there is an error meanwhile assertion checking. Mainly
errors with databases.

Figure 6.2: Assertion errors in logger

6.0.1 Config file

SBuggy has a config file (sbuggy.conf ) where specific configuration are stored
like database’s url, user name or logger info level. For assertions there are two
config key which is used: defaultFolder and assertionError.


23

defaultFolder has a value of path. It is used, when user wants to load a file,
which contains assertions. If this open dialog run first time, then initial path is
the value from key defaultFolder. If user use load operation after the first time,
then initial path is the last path from where user load assertions.

When assertion is correct syntactically and database connection and every-
thing is good (we do not get exception) but assertion fails, we want to know
why? In this case assertionError config key is important. It has a value of ALL
or ANY. If user just wants to know one example, why an assertion is failed,
ANY is used. If user wants to know all rows which fail the assertion, ALL
value is used.

Other config options

To use SBuggy comfortably, it has a lot of different config options in config file.
Mainly these are related to the database connection.

Database In url key you can set postgreSQL’s url, where you can reach the
database. It can be easily turn on SSL authentication with ssl key. And last one
database, which is also related to database connection, it contains the database
name, which you want to reach.

Logger Other config options is related to logger. Easily can be set up logger
infolevel. With logconf, which contains a path, you can add, where do you want
to save your logging.


Chapter 7

Working process

In the beginning of work the first thing was get to know SBuggy, because this
is the framework of assertion implementation. The second thing was to find
out the Assertion grammar, and mainly the question was what kind of assertion
do we need, and make sense in the industrial area. Parallel with this, start to
design user interface, and get to know Swing tool.

First implementation was to create user interface, so I had a skeleton, and
I knew what kind of function do I have to implement. This continues easier
function implementation, like load and save, adding assertion. As assertion
syntax outlined, I started to create grammar in ANTLR, and analyzed generated
output.

The hardest part was to provide a well-formed definition of assertions. In
some cases it was tricky to find a suitable definition in relational algebra. But
after that, the implementation was easier, because of equivalency between SQL
query and relational algebra. Finally I reviewed the code, re-factor and add
precise description of function in comment to facilitate the later work.

24


Chapter 8

Conclusions and Future
Work

This project was a great implementation practice in Java and a very useful
brain training. I was getting into a lot of different tool. By the way I created
an assertion language with my teacher not just theoretically way but practical
way.

I tried to do my best in this project. We had a lot of goal, and I achieved a
lot from them. But this program have to extend more. Firstly the grammar.
There are a lot of idea, which assertion structure will be useful. It is not easy
to find out. It has to discuss with a lot of professional, mainly who works in
industrial area. That is why SBuggy is different like other project. This is not
just a tool to play in university, but it is used in production environment. After
discussion, it will be easy to extend grammar, because I kept in mind to achieve
this. Now we already have an idea for new assertion, which is to check how
many rows are contained in a table. You will be add range or exact number.

Most important part now to finish, to create real connection between SBuggy
and assertion checking. Now just it connects with database, adds assertion after
check, and that is all. It is already useful but not enough. The real goal is to
mark views, which is wrong with SBuggy, and you can trace back the source of
error.

For do this connection, it has to be improve graphical user interface. With
some dialog, you will get some question about errors, which tables have wrong
rows, which tables cause the error It is also a good thing to write errors in
logger, but this is just for logging, it is not so user-friendly. Somehow has to be
find out, which is the best way to show errors. Moreover now to add assertion
through the program is really dull. A small editor – maybe special for assertion
– would be the best way to improve that part.

25


26 CHAPTER 8. CONCLUSIONS AND FUTURE WORK

Other part, which also connected to the user interface, is how to SBuggy
checks the assertions. Now if user click to check button, all assertion will be
evaluated, no matter it is already succeed or not. The first step is to modify
checking behaviour to check just failed assertions. Moreover it should be create
check all button, and separate check which is evaluate just the selected assertions
in table. Maybe it would be good to add a force checking option, which do the
same checking like now.

The current class hierarchy seems suitable in our setting, that includes a small
number of assertions. However, when it will be more, it would be a great idea
to re-factor AssertionChecker class before it will be a god class. Maybe this
one just should be responsible for handle checking, and other classes should be
create the SQL queries for assertions, and another classes which handles the
database connection.

At that moment, the SQL query representations of assertions are created
during the assertion checking phase, and SBuggy do not store them. It is a
real drawback because every time, when user starts assertion checking, program
recreates SQL queries, which is unnecessary. An easy improvement for future
version, which can really accelerate the program, is to create SQL statements
directly after syntax checking, and store the code together with assertions. In
this way the transformation from assertions into SQL, will be performed just
once, and if the assertion is changed by the user, then it has to be recreated
SQL query.

Except to find out new assertions, all thing that I mentioned, only suppose
small changes, although the final result is spectacular. I hope I can do these
changes myself, and improve SBuggy further not just in my foreign scholarship.


Chapter 9

Bibliography

[CGS12] Rafael Caballero, Yolanda Garcia-Ruiz, and Fernando Sáenz-Pérez.
Algorithmic debugging of sql views. In Proceedings of the 8th interna-
tional conference on perspectives of system informatics. Volume 7162.
In PSI’11. Springer-Verlag, Novosibirsk, Russia, 2012, pages 77–85.
isbn: 978-3-642-29708-3. doi: 10.1007/978-3-642-29709-0_9.

[Ora17] Oracle. Programming With Assertions.
http://docs.oracle.com/javase/7/docs/technotes/guides/

language/assert.html. Retrieved: 8 January 2017.

[Wid09] Jeffry D. Ullman - Jeniffer Widom. Adatbázisrendszerek - alapvetés.
Panem Kft., 2009.

27

http://dx.doi.org/10.1007/978-3-642-29709-0_9
http://docs.oracle.com/javase/7/docs/technotes/guides/language/assert.html
http://docs.oracle.com/javase/7/docs/technotes/guides/language/assert.html

	Abstract
	Resumen
	Introduction
	Relational algebra
	Operations
	Joins
	Extended relational algebra
	Additional operation in extended relational algebra


	Assertion language grammar
	Formal definition of assertions
	Operators
	Element checking


	Implementation details
	Assertions interpreted in SQL query
	Test


	User documentation
	Config file

	Working process
	Conclusions and Future Work
	Bibliography