You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
746 lines
31 KiB
746 lines
31 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
|
|
<title>KSpread Development Notes</title>
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<h1>KSpread Development Notes</h1>
|
|
|
|
<p>Maintainer: Ariya Hidayat (<a href="mailto:ariya@kde.org">ariya@kde.org</a>)</p>
|
|
<p>Some portions by Tomas Mecir (<a href="mailto: mecirt@gmail.com">mecirt@gmail.com</a>)</p>
|
|
<p>Revision: September 2004.</p>
|
|
|
|
<h2>Introduction</h2>
|
|
|
|
<p>This document contains information about internal structure of KSpread
|
|
as well as some notes of upcoming redesign. The sources for this document
|
|
are mainly the discussions which take place in koffice-devel mailing-list
|
|
and the source code itself.</p>
|
|
|
|
<h2>Document/View Architecture</h2>
|
|
|
|
<p>Status: IN PROGRESS.</p>
|
|
|
|
<p>MVC (Model/View/Controller) means that the application consists of three
|
|
big parts, the <i>Model</i> which holds the data structure and objects,
|
|
the <i>View</i> which shows the model to the user and the <i>Controller</i>
|
|
which handles user inputs and changes the model accordingly. Like other
|
|
office applications, KSpread uses the Document/View architecture, a slightly
|
|
different variant of MVC where the View and Controller are put together
|
|
as one part.</p>
|
|
|
|
<p>In order of its complexity scale, KSpread code has to be well separated,
|
|
i.e. the <i>Document</i> and the <i>View</i>. We may also call them as
|
|
<i>back-end</i> and <i>front-end</i> respectively. Right now part of which
|
|
should belong to the Document sometimes has access to the View. For example,
|
|
a cell stores information about its metrics in pixels (which is zoom dependent),
|
|
knows whether it is visible to the user or not (which is view dependent), etc.
|
|
This needs to be changed.</p>
|
|
|
|
<p>One easy way to decide whether some stuff or relationship must really really
|
|
alienated in the Document is to imagine that somebody wants to create another
|
|
View (front-end) to the Document object model (back-end) that is being worked
|
|
on. Say, one decent guy would like to copy the look-and-feel of classic Lotus
|
|
1-2-3 (for whatever reason we are not really interested in here); so basically
|
|
to some extent he can take most part of the KSpread back-end and glue a new
|
|
user interface around the code.</p>
|
|
|
|
<h2>Dependency Handling</h2>
|
|
<p>Status: IN PROGRESS.</p>
|
|
|
|
<p>When a cell holds a formula, then it is likely that it depends on other
|
|
cell(s) for calculating the result. For example, if cell A11 has the formula
|
|
"=SUM(A1:A10)", this means that values in cells A1, A2, A3, until
|
|
A10 must be correctly calculated first before the sum can be obtained for
|
|
cell A11. This is called <i>dependency</i>.</p>
|
|
|
|
<p>As for now, KSpread tries to manage dependency by storing the dependent
|
|
cells or ranges in the cell itself. This is not too efficient. If a cell
|
|
is very simple, i.e. stored only value, not formula, such scheme will just
|
|
waste a couple of bytes of pointers for the dependency data structure.
|
|
It is much more wiser to simply create one <i>dependency manager</i> for each
|
|
worksheet; it should be responsible for maintaining and handling cell
|
|
dependencies for that sheet. Also KSpread always stores ranges which depend
|
|
on one particular cell and ranges whose one of its dependent is that cell
|
|
(and these are all in the cell structure itself). This is not necessary as
|
|
that information is redundant. The dependency manager should be able to handle
|
|
both cases.</p>
|
|
|
|
<p>Let us have a look at this simple example:</p>
|
|
|
|
<table cellspacing="0" cellpadding="3" border="1">
|
|
<tr>
|
|
<td align="center"> </td>
|
|
<td align="center">A</td>
|
|
<td align="center">B</td>
|
|
<td align="center">C</td>
|
|
<td align="center">D</td>
|
|
</tr>
|
|
<tr>
|
|
<td align="center">1</td>
|
|
<td align="right">14</td>
|
|
<td align="right">36</td>
|
|
<td align="right"> </td>
|
|
<td align="right"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td align="center">2</td>
|
|
<td align="right">3</td>
|
|
<td align="right"> </td>
|
|
<td align="right"> </td>
|
|
<td align="right"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td align="center">3</td>
|
|
<td align="right">77</td>
|
|
<td align="right"> </td>
|
|
<td align="right"> </td>
|
|
<td align="right"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td align="center">4</td>
|
|
<td align="right">=SUM(<b>A1:A3</b>)</td>
|
|
<td align="right">=A4+SUM(<b>B1:B3</b>)</td>
|
|
<td align="right">=100*<b>B4</b></td>
|
|
<td align="right"> </td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>Such sheet should produce dependencies like:</p>
|
|
|
|
<table cellspacing="0" cellpadding="3" border="1">
|
|
<tr>
|
|
<td><b>Reference</b></td>
|
|
<td><b>Dependent(s)</b></td>
|
|
</tr>
|
|
<tr>
|
|
<td>A4</td>
|
|
<td>A1:A3</td>
|
|
</tr>
|
|
<tr>
|
|
<td>B4</td>
|
|
<td>A4 and B1:B3</td>
|
|
</tr>
|
|
<tr>
|
|
<td>C4</td>
|
|
<td>B4</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>When we want to recalculate cell B4, from the dependencies shown above we
|
|
may know that first we need to know values of cell A4 and range B1:B3. Further on,
|
|
cell A4 needs to know values of cells in range A1:A3. Therefore, <i>given one reference
|
|
cell</i> (e.g. B4), the dependency manager must be able to <i>return all
|
|
dependents, cells and/or ranges</i> (e.g. A4, B1:B3). Do we need to go
|
|
recursively when searching for dependencies? That really depends on the
|
|
implementation, but it is not a big problem, though.</p>
|
|
|
|
<p>In another case, say the user has changed cell A3 so we need to update the
|
|
calculation. We should not recalculate the whole sheet because it wastes time.
|
|
We just need to recalculate cells that depend on A3, in this case A4, B4 and C4.
|
|
So the dependency manager has another responsibility: <i>given a cell</i>
|
|
it should <i>find all cells and/or ranges which depend on that particular
|
|
cell</i>. It is a matter of iterating over all dependencies and checking
|
|
whether the cell is within the dependent(s) and returning the reference cell.
|
|
In this example, cell A3 is in the range A1:A3, a dependent range of cell A4.
|
|
Hence, we just return A4. Recursive or not, we can either continue finding
|
|
dependents of A4 or just stop here.</p>
|
|
|
|
<p>Note also that dependency manager should not store cell pointers, but rather
|
|
only the location of the cell (i.e. the sheet that owns the cell, row number and
|
|
column number). This is because on some cases the dependent cell may not exist
|
|
yet. As illustrated in the example, dependents of cell B4 are A4, B1, B2 and B3
|
|
but here cells B2 and B3 are still empty. Of course, when we just want to
|
|
know which cells we need to recalculate for one reference cell, the dependency
|
|
manager is allowed to return only non-empty cells (e.g. A4 and B1 in our case)
|
|
as empty cells have no effect and will not be recalculated anyway.</p>
|
|
|
|
<p>By the same manner, dependency manager can also held responsible when
|
|
chart comes into play. Any charts placed in the sheet (that are actually KChart
|
|
parts) depend on some values of the cells. An action by the user to changing
|
|
those cells, directly or indirectly, should trigger the update of the respective
|
|
charts.</p>
|
|
|
|
<p>Inter-sheet dependencies can be well handled if we store the owner of
|
|
each dependent. This is not shown yet in the explanation above to avoid
|
|
unnecessary complication. But let have one example now: if Sheet2!A1 is
|
|
"=SUM(Sheet1!A1:A10)" then changing Sheet1!A1 (the dependent)
|
|
means updating Sheet2!A1 (the reference). Of course during recalculation we
|
|
must take care that all sheets in the document must be processed, even though
|
|
only one single cell in one sheet has been changed.</p>
|
|
|
|
<p>Implementation-wise, there will be one instance of the dependency manager
|
|
for each sheet. This class will fully manage all dependencies and trigger cell
|
|
recalculation. An important part of this concept is this: The cell itself knows
|
|
<em>nothing</em> about dependencies, and it doesn't care about them either.
|
|
The cell will just inform about the fact, that its value has been changed,
|
|
and the dependency manager will do the rest. In addition, this gives us
|
|
recursive dependency calculation at almost no cost.</p>
|
|
|
|
<h2>Manipulators</h2>
|
|
<p>Status: PLANNED.</p>
|
|
|
|
<p>Currently, every operation on a cell or on a range of cells is quite complex.
|
|
You need to ensure correct repainting, recalculation, iterate on a range and so on.</p>
|
|
|
|
<p>To address this issue, manipulators shall be implemented. A manipulator will
|
|
implement one operation (formatting change, sequence fill, ..., ...).</p>
|
|
|
|
<p>Basically, usage of a manipulator should look like this:</p>
|
|
|
|
<p><pre>
|
|
Manipulator *manip = manipulatorManager::self()->getManip ("seqfill");
|
|
manip->setArgument ("type", 1);
|
|
... (more setArgument's)
|
|
manip->exec (selection);
|
|
</pre></p>
|
|
|
|
<p>That's all...</p>
|
|
|
|
<p>What concerns manipulator implementation, you'll derive from the base
|
|
manipulator and reimplement constructor and methods initialize()
|
|
(called just before the operation starts), processCell(), and maybe
|
|
done(). The constructor or initialize() would set some properties for
|
|
the cell-walking algorithm, and then it won't care about it anymore.
|
|
The base class will walk the range and call processCell() for each
|
|
cell, possibly creating it if it doesn't exist (if the manipulator
|
|
wants so).There will also be some methods that can be used to process
|
|
the whole range or row/column at once, if the manipulator wants to do so
|
|
(useful for, say, formatting manipulators that will be able to set attributes
|
|
of a whole range or row/col, in accordance with thoughts about format storage
|
|
below.</p>
|
|
|
|
<p>In addition, the manipulator can implement the undo/redo functionality
|
|
- the base manipulator will provide some common stuff needed to
|
|
accomplish this.</p>
|
|
|
|
<h2>Selection handling</h2>
|
|
<p>Status: PLANNED</p>
|
|
|
|
<p>The selection shall be an instance of some RangeList class,
|
|
or however we want to call it - this will contain a list of
|
|
cells/ranges/rows/whatever - like current selection, but will contain
|
|
more entries. This will allow easy implementation of CTRL-selections and so,
|
|
because thanks to manipulators, each operation will automatically support these.</p>
|
|
|
|
<h2>Repaint Triggering</h2>
|
|
<p>Status: PLANNED</p>
|
|
|
|
<p>As mentioned above, the interface between the core and the GUI needs to be kept
|
|
at minimum. Also, the number of repaints needs to be as low as possible, and repaints
|
|
should be groupped whenever possible. To achieve all this, the following approach
|
|
can be used:</p>
|
|
|
|
<p>When a cell is changed, it calls some method in KSpread::Sheet - valueChanged()
|
|
or formattingChanged(). These methods then trigger everything necessary, like
|
|
a call to the painting routine or dependency calculation.</p>
|
|
|
|
<p>This simple system would work on itself, but it would be slow. If you do
|
|
a sequence fill on A1:A1000 and you have a SUM(A1:A1000) somewhere, why would
|
|
you want to compute that SUM 1000 times, when you can simply compute it after
|
|
the sequence fill has been finished? Hence, the sheet will offer some more
|
|
methods - disableUpdates(), enableUpdates(), rangeListChanged() and
|
|
rangeListFormattingChanged(). All these will be used (solely?) by manipulators,
|
|
preferably by the base manipulator class, so that we don't have to call these
|
|
functions in each operation. After a call to disableUpdates(), there will
|
|
be no repainting and no dependency calculation. Note that a call to
|
|
enableUpdates() won't cause any repaints either, as the sheet cannot remember
|
|
all the calls (due to loss of range information). Hence, the base manipulator
|
|
class needs to call the correct rangeList*Changed method to trigger an
|
|
update in an effective way. The base manipulator needs to be configurable by
|
|
the manipulators that derive from it, so that it knows whether it changed
|
|
cell's content or formatting.</p>
|
|
|
|
<h2>Formula Engine</h2>
|
|
<p>Status: FINISHED.</p>
|
|
|
|
<p>This formula engine is just an expression evaluator. To offer better
|
|
performance, the expression is first compiled into byte codes which will
|
|
be executed later by a virtual machine.</p>
|
|
|
|
<p>Before compilation, the expression is separated into pieces, called tokens.
|
|
This step, which is also known as lexical analysis, takes places at once
|
|
and will produce sequence of tokens. They are however not stored and used only
|
|
for the purpose of the subsequent step.
|
|
Tokens are supplied to the parser, also known as syntax analyzer. In this
|
|
design, the parser is also a code generator. It involve the generation
|
|
of byte codes which represents the expression.
|
|
Evaluating the formula expression is now basically running the virtual
|
|
machine to execute compiled byte codes. No more scanning or parsing
|
|
are performed during evaluation, this saves time a lot.</p>
|
|
|
|
<p>The virtual machine itself (and of course the byte codes) are designed to be
|
|
as simple as possible. This is supposed to be stack-based, i.e. the virtual
|
|
machine has an execution stack of values which would be manipulated
|
|
as each byte code is processed. Beside the stack, there will be a list of
|
|
constant (sometimes also called as "constants pool") to hold Boolean,
|
|
integer, floating-point or string values. When a certain byte code needs
|
|
a constant for the operand, an index is specified which should be used
|
|
to look up the constant in the constants pool.</p>
|
|
|
|
<p>There are only few byte code, sufficient enough to perform calculation.
|
|
Yes, this is really minimalist but yet does the job fairly well.
|
|
The following provides brief description for each type of bytecode.</p>
|
|
|
|
<blockquote>
|
|
|
|
<p><i>Nop</i> means no operation.</p>
|
|
|
|
<p><i>Load</i> means loads a constant and push it to the stack. The constant can
|
|
be found at constant pools, at position by 'index', it could be a Boolean,
|
|
integer, floating-point or string value.</p>
|
|
|
|
<p><i>Ref</i> means gets a value from a reference. Member variable 'index' will
|
|
refers to a string value in the constant pools, i.e. the name of the reference.
|
|
Typically the reference is either a cell (e.g. A1), range of cells (A1:B10)
|
|
or possibly function name. Example: expression A2+B2 will be compiled as:<br>
|
|
Constants:<br>
|
|
#0: "A2"<br>
|
|
#1: "B2"<br>
|
|
Codes:<br>
|
|
Ref #0<br>
|
|
Ref #1<br>
|
|
Add
|
|
</p>
|
|
|
|
<p><i>Function</i>.
|
|
Example: expression "sin(x)" will be compiled as:<br>
|
|
Constants:<br>
|
|
#0: "sin"<br>
|
|
#1: "x"<br>
|
|
Codes:<br>
|
|
Ref #0<br>
|
|
Ref #1<br>
|
|
Function 1
|
|
</p>
|
|
|
|
<p><i>Neg</i> is a unary operator, a value is popped from stack and negated and then
|
|
pushed back to the stack. If it is not number (Boolean or string), it
|
|
will be converted first.</p>
|
|
|
|
<p><i>Add, Sub, Mul, Div and Pow</i> are binary operators, two values are popped from
|
|
stack and processed (added, subtracted, multiplied, divided, or power) and
|
|
the result is pushed to the stack.</p>
|
|
|
|
<p><i>Concat</i> is string operation, two values are popped from stack (and converted
|
|
to string if they are not string values), concatenated, and the result is
|
|
pushed to the stack.</p>
|
|
|
|
<p><i>Not</i> is a logical operation, a value is popped from stack and its Boolean
|
|
not is pushed into the stack. When it is not Boolean value, there will be
|
|
a cast.</p>
|
|
|
|
<p><i>Equal, Less, and Greater</i> are comparison operators, two values are
|
|
popped from stack and compared appropriately. The result, which is a Boolean
|
|
value, is pushed into the stack. To simplify, there no "not equal"
|
|
comparison because it can be regarded as "equal" followed by
|
|
"not" byte codes. Same goes for "less than or equal to" and
|
|
"greater than or equal to".</p>
|
|
|
|
</blockquote>
|
|
|
|
<p>The expression scanner is based on finite state acceptor. The state denotes
|
|
the position of cursor, e.g. inside a cell token, inside an identifier, etc.
|
|
State transition is following by emitting the associated token to the
|
|
result buffer. Rather than showing the state diagrams here, it is much more
|
|
convenience and less complicated to browse the scanner source code and try
|
|
to follow its algorithm from there.</p>
|
|
|
|
<p>The parser is designed using one of bottom-up parsing technique, namely
|
|
based on Polish notation. Instead of ordering the tokens in suffix Polish
|
|
form, the parser (which is also the code generator) simply outputs
|
|
byte codes. In its operation, the parser requires the knowledge of operator
|
|
precedence to correctly translate unparenthesized infix expression and
|
|
thus requires the use of a syntax stack.</p>
|
|
|
|
<p>The parser algorithm is given as follows:</p>
|
|
|
|
<blockquote>
|
|
Repeat the following steps:<br>
|
|
Step 1: Get next token<br>
|
|
Step 2: If it is an identifier<br>
|
|
- push it to syntax stack<br>
|
|
- generated "Ref"<br>
|
|
Step 3: If it is a Boolean, integer, float or string value<br>
|
|
- push it to syntax stack<br>
|
|
- generated "Load"<br>
|
|
Step 4: If it is an operator<br>
|
|
- check for reduce rules<br>
|
|
<br>
|
|
- when no more rules applies, push token to the syntax stack<br>
|
|
</blockquote>
|
|
|
|
<p>The reduce rules are:</p>
|
|
|
|
<p>Rule A: <i>function argument</i>:
|
|
if token is semicolon or right parenthesis,
|
|
if syntax stack looks as:
|
|
<ul type="square">
|
|
<li>non-operator <--- top</li>
|
|
<li>operator ;</li>
|
|
<li>non-operator</li>
|
|
<li>operator (</li>
|
|
<li>identifier</li>
|
|
</ul>
|
|
then reduce to
|
|
<ul type="circle">
|
|
<li>non operator</li>
|
|
<li>operator (</li>
|
|
<li>identifier</li>
|
|
<li>increase number of function arguments</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>Rule B: last function argument<br>
|
|
if syntax stack looks as:<br>
|
|
<ul type="square">
|
|
<li>operator )</li>
|
|
<li>non-operator</li>
|
|
<li>operator (</li>
|
|
<li>identifier</li>
|
|
</ul>
|
|
then reduce to:<br>
|
|
<ul type="circle">
|
|
<li>non-operator</li>
|
|
<li>generated "Function" + number of function arguments</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>Rule C: function without argument<br>
|
|
if syntax stack looks as:<br>
|
|
<ul type="square">
|
|
<li>operator )</li>
|
|
<li>operator (</li>
|
|
<li>identifier</li>
|
|
</ul>
|
|
then reduce to:<br>
|
|
<ul type="circle">
|
|
<li>non-operator (dummy)</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>Rule D: parenthesis removal<br>
|
|
if syntax stack looks as:<br>
|
|
<ul type="square">
|
|
<li>operator (</li>
|
|
<li>non-operator</li>
|
|
<li>operator )</li>
|
|
</ul>
|
|
then reduce to:<br>
|
|
<ul type="circle">
|
|
<li>non-operator</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>Rule E: binary operator<br>
|
|
if syntax stack looks as:<br>
|
|
<ul type="square">
|
|
<li>non-operator</li>
|
|
<li>binary operator</li>
|
|
<li>non-operator</li>
|
|
<li>and if the precedence of the binary operator in the syntax stack
|
|
is greater or equals to the precedence of token</li>
|
|
</ul>
|
|
then reduce to:<br>
|
|
<ul type="circle">
|
|
<li>non-operator</li>
|
|
<li>and generated appropriate byte code for the binary operator</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>Rule F: unary operator<br>
|
|
if syntax stack looks as:<br>
|
|
<ul type="square">
|
|
<li>non-operator</li>
|
|
<li>unary operator</li>
|
|
<li>operator</li>
|
|
</ul>
|
|
then reduce to:<br>
|
|
<ul type="circle">
|
|
<li>operator</li>
|
|
<li>and generated "Neg" if unary operator is '-'</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>Percent operator is a special case and not handled the above mentioned rule.
|
|
When the parser finds the percent operator, it checks whether there's a non-operator
|
|
token right before the percent. If yes, then the following code is generated:
|
|
<tt>load 0.01</tt> followed by <tt>multiply</tt>.</p>
|
|
|
|
<h2>Value</h2>
|
|
|
|
<p>Status: FINISHED.<br>
|
|
</p>
|
|
|
|
<p>to be written.</p>
|
|
|
|
<h2>Commands Based on KCommand<br>
|
|
</h2>
|
|
|
|
<p>Status: IN PROGRESS.</p>
|
|
|
|
<p>Until lately, to implement undo and redo, KSpread creates corresponding
|
|
KSpreadUndo classes for each action and runs them when the user undoes
|
|
those actions. KSpreadUndo also has redo function whose job is to redo
|
|
again the action after being undone.</p>
|
|
|
|
<p>All this needs to be converted to manipulators - these will be KCommand,
|
|
hence we should be able to undo/redo every operation (provided that the
|
|
corresponding manipulator provides methods to store/recall the undo information).</p>
|
|
|
|
<h2>Cell Storage</h2>
|
|
<p>Status: PLANNED.</p>
|
|
|
|
<p>Cells are grouped together, and then hashed.</p>
|
|
|
|
<h2>Format Storage</h2>
|
|
<p>Status: PLANNED.</p>
|
|
|
|
<p>Formatting specifies how a cell should look like. It involves font
|
|
attributes like bold or italics, vertical and horizontal alignment,
|
|
rotation angle, shading, background color and so on. Each cell can have
|
|
its own format, but bear also in mind that a whole row or column format
|
|
should also apply.</p>
|
|
|
|
<p>Current way of storing formatting information is rather inefficient:
|
|
pack it together inside the cell. The reason is because most of cells
|
|
are either very plain (no formatting) and/or only have partial attribute
|
|
(e.g. only marked as bold, no font family or color is specified).
|
|
Therefore the approximately 20 bytes used to hold formatting information
|
|
are quite a waste of memory. Even worse, this requires that the cell
|
|
must exist even if it is not in use. As illustration, imagine a worksheet
|
|
where within range A1:B20 only 5 cells are not empty. When the user
|
|
selects this range and changes the background color to yellow, then
|
|
those 5 cells must store this information in their data structure but
|
|
how about the other 35 cells? Since the formatting is attached to the cell,
|
|
there is no choice but to create them. Doing this, just for the sake of
|
|
storing format, is actually not really good.</p>
|
|
|
|
<p>A new way to store formatting information is proposed below.</p>
|
|
|
|
<p>For each type of format a user can use, we have the corresponding
|
|
<i>formatting piece</i>, for example "bold on", "bold off",
|
|
"font Arial", "left-border 1 px", etc. Whenever the user
|
|
applies formatting to a range (could also be a whole column, row, or worksheet),
|
|
we save appropriate respective formatting piece in a stack. Say the user has
|
|
marked column B as bold, row 2 as italics, and set range A1:C5 with
|
|
yellow background color. Our formatting stack would look like:
|
|
</p>
|
|
|
|
<table cellspacing="0" cellpadding="3" border="1">
|
|
<tr>
|
|
<td><b>Range</b></td>
|
|
<td><b>Formatting Piece</b></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Column B</td>
|
|
<td>Bold on</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Row 2</td>
|
|
<td>Italics on</td>
|
|
</tr>
|
|
<tr>
|
|
<td>A1:C5</td>
|
|
<td>Yellow background</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>Now let try to figure out the overall format of cell B2. From the first
|
|
we know it should be bold, from the second it should be italics, and last
|
|
it should have yellow background. This complete format is the one which
|
|
we used to render cell B2 on screen. In similar fashion, we can know that
|
|
cell A1 on the other only specifies the yellow background, because the first
|
|
and second pieces do not apply there.</p>
|
|
|
|
<p>Another possible way to see the format storage is by using 3-D perspective.
|
|
For each formatting piece, imagine there is a surface which covers the
|
|
formatted range (the xy-plane). The formatting information is simply attached
|
|
to the surface (say, as surface attribute). Every surface is stacked together,
|
|
its depth (the z-axis) denotes the sequence, i.e. the first surface is the
|
|
deepest. For the example above, we can view the pieces as one surface specifies
|
|
"bold on" which is a vertical of column B, one surface specifies
|
|
"italics on" which is a horizontal band of row 2 and one last surface
|
|
which specifies "yellow background" stretched in the range A1:C5.
|
|
How to find complete format for A2? This is now a matter of surface
|
|
determination. Traversing in the direction of z-axis from B2 reveals
|
|
that we hit the last and second surfaces only; thereby we can know the complete
|
|
format is "italics, yellow background".</p>
|
|
|
|
<p>It is clear that a format storage corresponds to one sheet only. For each
|
|
sheet, there should be one format storage. Cells can still have accessors to
|
|
its formatting information, these simply become wrapper for proper calls to the
|
|
format storage. Since each formatting piece holds information about the applied
|
|
range, we must take care that the formatting storage is correctly updated
|
|
whenever cells, rows or columns are inserted and deleted.</p>
|
|
|
|
<p>In order to avoid low performance, we must use a smart way to iterate over
|
|
all formatting pieces whenever we want to find out complete format for given
|
|
cell(s). When the sheet gets very complex, it is likely that we will have
|
|
many many formatting pieces that are not even overlap. This means, when we
|
|
need formatting of cell A1, it is no use to check formatting pieces of range
|
|
Z1:Z10 or A100:B100 which do not include cell A1 and are even very far from A1.
|
|
One possible solution is to arrange the formatting pieces in a quad-tree.
|
|
Because one piece can cover a very large area, it is possible that it will
|
|
be in more than one leaf in the quad-tree. <i>Details on the possible use of
|
|
quad-tree or other methods should be explored further more</i>.</p>
|
|
|
|
<h2>Default Toolbars</h2>
|
|
<p>Status: IN PROGRESS.</p>
|
|
|
|
<p>Relevant mailing-list threads:</p>
|
|
|
|
<ul>
|
|
<li><a href="http://lists.kde.org/?l=kde-usability&m=110751297906231&w=2">
|
|
http://lists.kde.org/?l=kde-usability&m=110751297906231&w=2</a></li>
|
|
<li><a href="http://lists.kde.org/?l=koffice-devel&m=110723903332496&w=2">
|
|
http://lists.kde.org/?l=koffice-devel&m=110723903332496&w=2</a></li>
|
|
</ul>
|
|
|
|
<p>Toolbars are utilized to place most frequently used actions. It is
|
|
important to present the user with default toolbars which make
|
|
sense, i.e. they do not contain unnecessary buttons. In-depth usability
|
|
analysis and/or further discussions are needed to make decision which
|
|
buttons need to be in the toolbar and which don't.</p>
|
|
|
|
<p>For reference, here is a list of default shown toolbars in
|
|
some spreadsheet applications:</p>
|
|
|
|
<p><b>Microsoft Excel 2002</b>:</p>
|
|
|
|
<ul>
|
|
<li><em>Standard toolbar</em>: New, Open, Save, Search, Print, Print Preview,
|
|
Spelling, Cut, Copy, Paste, Format Painter, Undo, Redo, Insert Hyperlink,
|
|
Autosum, Sort Ascending, Sort Descending, Chart Wizard, Drawing, Zoom, Help</li>
|
|
<li><em>Formating toolbar:</em>: Font, Font Size, Bold, Italic, Align Left,
|
|
Align Center, Align Right, Merge and Center, Format Currency, Format Percent,
|
|
Comma Style, Increase Precision, Decrease Precision, Increase Indent,
|
|
Decrease Indent, Merge Cells, Unmerge Cells, Merge Across, Borders,
|
|
Fill Color, Font Color</li>
|
|
</ul>
|
|
|
|
|
|
<p><b>OpenOffice 2.0</b>:</p>
|
|
|
|
<ul>
|
|
<li><em>Standard toolbar</em>: New, Open, Save, E-mail, Edit, PDF, Print,
|
|
Page Preview, Spellcheck, Cut, Copy, Paste, Format Paintbrush, Undo, Redo,
|
|
Hyperlink, Sort Ascending, Sort Descending, Insert Chart, Navigator, Styles,
|
|
Gallery, Zoom</li>
|
|
<li><em>Format Object toolbar:</em> Font Name, Font Size, Bold, Italic,
|
|
Underline, Font Color, Align Left, Align Center, Align Right, Justified,
|
|
Merge Cells, Format as Currency, Format as Percent, Format Standard,
|
|
Increase Precision, Decrease Precision, Increase Indent, Decrease Indent,
|
|
Borders, Line Style, Line Color, Background Color, Align Top, Align Center
|
|
Vertically, Align Bottom </li>
|
|
</ul>
|
|
|
|
<p><b>Gnumeric 1.4</b>:</p>
|
|
|
|
<ul>
|
|
<li><em>Standard toolbar</em>: New, Open, Save, Print, Print Preview,
|
|
Cut, Copy, Paste, Undo, Redo, Hyperlink, Autosum, Function/Formula,
|
|
Insert Chart, Zoom</li>
|
|
<li><em>Format toolbar:</em> Font Family, Font Size, Bold, Italic,
|
|
Underline, Align Left, Align Center, Align Right, Merge and Center,
|
|
Merge Cells, Split Cell, Format Currency, Format Percent, Thousand Separator,
|
|
Increase Precision, Decrease Precision, Decrease Indent, Increase Indent,
|
|
Borders, Background, Foreground</li>
|
|
</ul>
|
|
|
|
<h2>Test Framework</h2>
|
|
<p>Status: IN PROGRESS.</p>
|
|
|
|
<p>Relevant mailing-list threads:</p>
|
|
|
|
|
|
<ul>
|
|
<li><a href="http://lists.kde.org/?l=kde-cvs&m=109511244721480&w=2">
|
|
http://lists.kde.org/?l=kde-cvs&m=109511244721480&w=2</a></li>
|
|
<li><a href="http://lists.kde.org/?l=koffice-devel&m=109518196922944&w=2">
|
|
http://lists.kde.org/?l=koffice-devel&m=109518196922944&w=2</a></li>
|
|
</ul>
|
|
|
|
<p>It is well known that writing clean and easily understandable module will lead
|
|
to better maintenance. However testing that particular module everytime there is
|
|
a significant change requires considerable amount of time and effort. Since KSpread
|
|
and other applications of its scale consist of hundreds of modules, in this case
|
|
automatic testing of each module will help a lot, not to mention that it might
|
|
catch bug as early as possible.</p>
|
|
|
|
<p>KSpread has a simple test framework to facilitate such kind of test. This can
|
|
be activated using the shortcut <tt>Ctrl+Shift+T</tt>. This test is however
|
|
not accessible via menu, because it is intended to be used only by the developers.
|
|
Ideally, there should be tests for all modules contained in KSpread. It is
|
|
the responsibility of the developer to create the corresponding <tt>tester</tt>
|
|
for the code that he or she is working on. All tests should be kept in
|
|
<tt>koffice/kspread/tests/</tt>.</p>
|
|
|
|
<p>Making a new tester is not difficult. The easiest way is to copy an already
|
|
existing tester and modify it. Basically, it must be a subclass of class Tester
|
|
(see <tt>koffice/kspread/tests/tester.h</tt>). Just reimplement the virtual
|
|
function <tt>run()</tt> and it is ready. In order to make it possible to run
|
|
the new tester, add an instance of the class in TestRunner
|
|
(for details, see <tt>koffice/kspread/tests/testrunner.cc</tt>).</p>
|
|
|
|
<p>A tester must be self-contained, it should not use any test data from
|
|
current document. If necessary, it must create (or hard code) the data by
|
|
itself.</p>
|
|
|
|
<p>Whenever parts of KSpread features are improved or rewritten, it is always
|
|
a good idea to run the related tests to ensure that all the changes do not do
|
|
any harm. However, bear in mind that there is no 100% guarantee that the new
|
|
code is bug-free.</p>
|
|
|
|
<p>Also, if there is a bug which is not caught by the tester (i.e. it does not
|
|
fail the tester, but the bug is confirmed), then the relevant tester must be
|
|
modified to include one or more test cases similar to the offending bug.
|
|
When the bug is finally fixed, from that point the test should always pass all
|
|
test cases.</p>
|
|
|
|
</p>
|
|
|
|
<h2>Coding Style</h2>
|
|
<p>Status: IN PROGRESS.</p>
|
|
|
|
<p>(to be written in details).</p>
|
|
|
|
<p>Write <b>clean code</b>. To be correct is better than to be fast.
|
|
KSpread source code is known to grow very fast in its early days and but later
|
|
on also more difficult to understand.</p>
|
|
|
|
<p>Put comment as documentation for classes and member functions. There is still
|
|
lack of documentation as for now, whoever understands something about the
|
|
classes and functions should write the documentation.</p>
|
|
|
|
<p>In complex source files, list of header includes can be very long. Unless
|
|
there is special reason not do it, try to group them together, i.e. standard
|
|
C/C++ headers come first, followed by TQt headers, and then KDE headers,
|
|
KOffice core/UI headers and application specific headers. For each group,
|
|
sort the header files alphabetically. </p>
|
|
|
|
<p>Write test cases. This will ease further maintenance. See also the section on Test
|
|
Framework above.</p>
|
|
|
|
<p>Do not use the term <i>table</i>. It was incorrectly invented quite likely
|
|
because of the term <i>Tabelle</i> (German, literally means table). The correct
|
|
term is <i>sheet</i> or <i>worksheet</i>. The English version of Microsoft
|
|
uses <i>sheet</i> while the German version uses <i>Tabelle</i>.</p>
|
|
|
|
<p>Use <a href="http://developer.kde.org/documentation/library/kdetqt/trinityarch/devel-binarycompatibility.htm">d-pointer</a> trick (also known pimpl) whenever possible. Such practice will help when later on
|
|
we want to expose the API and need to maintain binary compatibility. But the
|
|
most important thing is to separate the interface and the implementation.
|
|
Furthermore, build time is reduced since modification on the implementation
|
|
would not cause tons of recompile.</p>
|
|
|
|
<p>When creating a new class, use namespace KSpread. Do not use KSpread prefix
|
|
anymore. Example: use <tt>KSpread::Foo</tt> instead of <tt>KSpreadFoo</tt>.
|
|
Also source file name should not contain kspread prefix anymore, i.e.
|
|
<tt>foo.h</tt> and <tt>foo.cc</tt> (but not <tt>kspread_foo.h</tt> and
|
|
<tt>kspread_foo.cc</tt>) for the above example.</p>
|
|
|
|
</body></html>
|