You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
koffice/kexi/doc/dev/sqlite_issues.txt

201 lines
9.3 KiB

---------------------------------------------------------
SQLITE DRIVER IDEAS, ISSUES, PROPOSALS
Copyright (C) 2003 Jaroslaw Staniek js at iidea dot pl
Started: 2003-07-09
Kexi home page: http://www.koffice.org/kexi/
---------------------------------------------------
1. In most situations (especially on massive data operations) we do not want get types of the columns,
so:
PRAGMA show_datatypes = OFF;
2. SQLite automatically adds primary key to the table if there is no such key.
Such pkey column is not visible for statemets like 'select * from table',
'select oid,* from table' need to be executed to also get this special column.
See section '3.1 The ROWID of the most recent insert' of c_interface.html file.
3. For smaller tables (how small? -- add configuration for this) sqlite_get_table() 'in memory'
function could be used to speed up rows retrieving.
4. Queries entered by user in the Query Designer should be checked for syntactically or logically validity and transformed to SQLite-compatible form befor execution. It is nonsense to ask SQLite engine if the given sql statement is valid, because then we wouldn't show too detailed error message to the user.
5. SQLite not only doesn't handles column types but also doesn't checks value sizes, eg. it is possible to insert string of length 100 to the column of size 20.
These checks should be made in KexiDB SQLite engine driver. In fact for each driver these checks could be made because user wants get a descriptive, localized, friendly message what's wrong. No single engine provides this of course. We need to store such a parameters like field size in project meta-data as sqlite doesn't stores that in any convenient way. It stores only 'CREATE TABLE' statement, as is.
6. Possible storage methods for SQLite database embedded in Kexi project:
A. Single SQLite-compatible database file (let's name it: .sqlite file)
- Advantages: Best deal for bigger databases - no need for rewriting data form SQLite file to another,
fastest open and save times. DB data consumes disk space only once. Other applications that uses SQLite library could also make use of standard format of .sqlite file's contents. Kexi project and data would be easily, defacto, separated, what is considered as good method in DB programming.
- Disadvantages: User (who may want to transfer a database) need to know that .kexi file doesn't stores his data but .sqlite is for that.
B. Single SQLite-compatible database file embedded inside Kexi project .kexi file.
SQLite requires an access to a file in its own (raw) format to be available somewhere in the path. If SQLite storing layer could be patched to adding an option for seek to given file position, sqlite data can be stored after Kexi project data. When sqlite raw data file could be saved after a Kexi project's data, rewriting the project contents should be performed (and this is done quite frequently). So, finally storing both files concatenated during normal operations is risky, costly and difficult to implement cleanly.
- Advantages: User do not need to know that there is sqlite used in Kexi as embedded DB engine (and even if there is any sql engine). Transferring just one file between machines means successfully transferring data and project.
- Disadvantages: lack of everything described as advantages of A. method: difficult and costly open and save operations (unless SQLite storing layer could be patched).
Extensions and compilations of the both above methods:
- .sqlite files are really good compressable, so compress option can be added (if not for regular saving, then at least for "Email project & data" or 'Save As' actions. For these actions concatenating the sqlite data with Kexi project's data would be another option convenient from user's point of view.
CURRENT IMPLEMENTATION: B way is selected with above extensions added to the TODO list.
7. SQLite-builtin views are read-only. So the proposal is not to use them. Here is why:
We want have rw queries in Kexi if main table in a query is rw.
<DEFINITION>: Main table T in a query Q is a table that is not at 'many' side of query relations.
</DEFINITION>
<Example>:
table persons (name varchar, city integer);
table cities (id integer primary key, name varchar);
DATA: [Jarek, 1]-------[1, Warsaw]
/
[Jakub, 1]-----/
query: select * from persons, cities
Now: 'cities' table is the main table (in other words it is MASTER table in this query).
'cities' table is rw table in this query, while 'persons' table is read-only because it is at 'many' side
in persons-cities relation. Modifying cities.id field, appropriate persons.city values in related
records will be updated if there is cascade update enabled.
</Example>
IDEAS:
A) Query result view (table view, forms, etc.) should allow editing fields from
main (master) table of this query, so every field object KexiDB::Field should have a method:
bool KexiDB::Field::isWritable() to allow GUI check if editing is allowed. Look that given field object
should be allocated for given query independently from the same field allocated for table schema.
The first field object can disallow editing while the latter can allow editing (because it is
component of regular table).
B) Also add method for TQString KexiDB::Field that returns i18n'd message about the reasons
of disallowing for editing given field in a context of given query.
----------------------------------------------------------------
8. ERRORS Found
8.1 select * from (select name from persons limit 1) limit 2
-should return 1 row; returns 2
----------------------------------------------------------------
HINTS:
PRAGMA table_info(table-name);
For each column in the named table, invoke the callback function
once with information about that column, including the
column name, data type, whether or not the column can be NULL,
and the default value for the column.
---------------------------------------------------------------
OPTIMIZATION:
Re: [sqlite] Questions about sqlite's join translation
Od:
D. Richard Hipp <drh-X1OJI8nnyKUAvxtiuMwx3w@public.gmane.org>
Odpowiedz do:
sqlite-users-CzDROfG0BjIdnm+yROfE0A@public.gmane.org
Data:
sobota 9 pa<70>dziernika 2004 02:59:06
Grupy:
gmane.comp.db.sqlite.general
Nawi<EFBFBD>zania: 1
Keith Herold wrote:
> The swiki says that making JOINs into a where clause is more efficient,
> since sqlite translates the join condition into a where clause.
When SQLite sees this:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>SELECT<EFBFBD>*<2A>FROM<4F>a<EFBFBD>JOIN<49>b<EFBFBD>ON<4F>a.x=b.y;
It translate it into the following before compiling it:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>SELECT<EFBFBD>*<2A>FROM<4F>a,<2C>b<EFBFBD>WHERE<52>a.x=b.y;
Neither form is more efficient that the other.<2E><>Both<74>will<6C>generate
identical code.<2E><>(There<72>are<72>subtle<6C>differences<65>on<6F>an<61>LEFT<46>OUTER
JOIN, but those details can be ignored when you are looking at
things at a high level, as we are.)
<EFBFBD>><3E>It<49>also
> says that you make queries more effiecient by minimizing the number of
> rows returned in the FROM clause as far to the left as possible in the
> join.<2E><>Does<65>the<68>latter<65>matter<65>if<69>you<6F>are<72>translating<6E>everything<6E>into<74>a
> where<72><65>clause<73>anyway?
>
SQLite implements joins using nested loops with the outer
loop formed by the first table in the join and the inner loop
formed by the last table in the join.<2E><>So<53>for<6F>the<68>example
above you would have:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>For<EFBFBD>each<EFBFBD>row<EFBFBD>in<EFBFBD>a:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>For<EFBFBD>each<EFBFBD>row<EFBFBD>in<EFBFBD>b<EFBFBD>such<EFBFBD>that<EFBFBD>b.y=a.x:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Return<EFBFBD>the<EFBFBD>row
If you reverse the order of the tables in the FROM clause like
this:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>SELECT<EFBFBD>*<2A>FROM<4F>b,<2C>a<EFBFBD>WHERE<52>a.x=b.y;
You should get an equivalent result on output, but SQLite will
implement the query differently.<2E><>Specifically<6C>it<69>does<65>this:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>For<EFBFBD>each<EFBFBD>row<EFBFBD>in<EFBFBD>b:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>For<EFBFBD>each<EFBFBD>row<EFBFBD>in<EFBFBD>a<EFBFBD>such<EFBFBD>that<EFBFBD>a.x=b.y:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Return<EFBFBD>the<EFBFBD>row
The trick is that you want to arrange the order of tables so that
the "such that" clause on the inner loop is able to use an index
to jump right to the appropriate row instead of having to do a
full table scan.<2E><>Suppose,<2C>for<6F>example,<2C>that<61>you<6F>have<76>an<61>index
on a(x) but not on b(y).<2E><>Then<65>if<69>you<6F>do<64>this:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>SELECT<EFBFBD>*<2A>FROM<4F>a,<2C>b<EFBFBD>WHERE<52>a.x=b.y;
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>For<EFBFBD>each<EFBFBD>row<EFBFBD>in<EFBFBD>a:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>For<EFBFBD>each<EFBFBD>row<EFBFBD>in<EFBFBD>b<EFBFBD>such<EFBFBD>that<EFBFBD>b.y=a.x:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Return<EFBFBD>the<EFBFBD>row
For each row in a, you have to do a full scan of table b.<2E><>So
the time complexity will be O(N^2).<2E><>But<75>if<69>you<6F>reverse<73>the<68>order
of the tables in the FROM clause, like this:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>SELECT<EFBFBD>*<2A>FROM<4F>b,<2C>a<EFBFBD>WHERE<52>b.y=a.x;
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>For<EFBFBD>each<EFBFBD>row<EFBFBD>in<EFBFBD>b:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>For<EFBFBD>each<EFBFBD>row<EFBFBD>in<EFBFBD>a<EFBFBD>such<EFBFBD>that<EFBFBD>a.x=b.y
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Return<EFBFBD>the<EFBFBD>row
No the inner loop is able to use an index to jump directly to the
rows in a that it needs and does not need to do a full scan of the
table.<2E><>The<68>time<6D>complexity<74>drops<70>to<74>O(NlogN).
So the rule should be:<3A><>For<6F>every<72>table<6C>other<65>than<61>the<68>first,<2C>make
sure there is a term in the WHERE clause (or the ON or USING clause
if that is your preference) that lets the search jump directly to
the relavant rows in that table based on the results from tables to
the left.
Other database engines with more complex query optimizers will
typically attempt to reorder the tables in the FROM clause in order
to give you the best result.<2E><>SQLite<74>is<69>more<72>simple-minded<65>-<2D>it
codes whatever you tell it to code.
Before you ask, I'll point out that it makes no different whether
you say "a.x=b.y" or "b.y=a.x".<2E><>They<65>are<72>equivalent.<2E><>All<6C>of<6F>the
following generate the same code:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ON<EFBFBD>a.x=b.y
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ON<EFBFBD>b.y=a.x
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>WHERE<EFBFBD>a.x=b.y
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>WHERE<EFBFBD>b.y=a.x
---------------------------------------------------------------