Discussion:
[Abook-devel] RFC: don't rewrite needlessly the database to disk
Raphaël Droz
2014-05-08 17:17:07 UTC
Permalink
Hi,


attached is a tricky patch:
It skip database rewrites to disk when no changes actually happened.


The current issue is that there is no locking/notify mechanism for the
database file and, especially if using autosave=true, multiple abook
instances imply that "latest closed wins".

Reproducer:
- term1 run abook
- term2 run mutt, add an email (trigger abook --add-email)
# here datafile is changed on disk with the new email
- close abook in term1
# here datafile is changed on disk, the email is lost


There are several ways to avoid/warn when this happens, but all of them
first needs that we avoid file rewrites when no change happened, that
means isolating write operations.

Currently:
$ ls -i .abook/addressbook
8535 .abook/addressbook
$ abook
# press "q" to quit
$ ls -i .abook/addressbook
7943 .abook/addressbook


This is what the patch attempts to solve by marking the database "need
save" after (every) "write" operation on items.
That allowed adding a "*" indicator à la emacs to show the "db modified"
status.

The more sensible here is to ensure that *every* possible action
implying modification of the database will really triggers
db_need_save = TRUE.

If some are missing from the patch the related modifications will *not*
be saved to disk. That is to say that this patch has to be *sure*.


I need to think more about the behavior when autosave=false but IHMO
the check for "is db modified" must be activated for all *saving*
vectors, that means that pressing "w" to explicitly save the database to
disk will *not* rewrite if there is no modification.


I tried many (if not all) of the possible operations offered by the UI
and hope to not have missed any. But as shown by the hunk affecting edit.c,
database write operations are not strictly restricted to database.c.
Further testing welcome.



Next I'd see a more robust way to handle concurrent write accesses to
datafile. I can think about file-lock and inotify and I'd rather want
to avoid any need of datafile "merge".



regards
--
GPG id: 0xD7F62B21
l***@mailoo.org
2014-05-08 18:19:47 UTC
Permalink
Post by Raphaël Droz
Next I'd see a more robust way to handle concurrent write accesses to
datafile. I can think about file-lock and inotify and I'd rather want
to avoid any need of datafile "merge".
I would think of an algorithm like : watch the addressbook file with
inotify, if it is closed after a write, reload it if db_need_save is
false or merge it with the actual state if db_need_save is true. It
doesn't completely avoid the need of merging, but reduce it. To reduce
even more the risk of merges, we could rewrite the database after every
changes, but I think it would generate too much I/O and lag on big
addressbooks.

We could also do a small deamon launched when the first abook instance
is launched and closed with the last one, which would open a fifo or an
unix socket. All instances would send their changes to that deamon,
which would handle concurrent access. I think it removes completely the
need of any merge, but it would mean rewriting a lot of code, and
adding a lot of complexity.

That was just some thought of the problem, nothing like a real solution.
Raphaël Droz
2015-09-28 23:42:19 UTC
Permalink
Just pushed something similar in d7371920.
ui/interactive: when abook quits, it does not rewrite the database anymore
*if* no change has been done to the items.
Running 2 abook instances concurrently is now less race-prone.
An indicator is added to the top line showing whether the database needs
saving on disk or not.
Warning: The indicator *is* introduced by the commit, but to stay safe, the
change about optionnal database disk write is only effective for debug builds
(= compiled using DEBUG = `make CFLAGS=-DDEBUG`)
Note: right now editing a field (even without changing to the contained string) is
enough to be considered a "database change"
- if a field is modified in some way and the "*"-noted indicator does *not* show up
=> it's a bug, please report
- if a field is not modified and the "*"-noted indicator show up
=> it's a bug, please report
Starting the edition of a field is enough to imply database-modification
(right it's the expected behavior, ie: "not a bug")


"optional on-disk write" is only activated if abook is built using debug
(make CFLAGS=-DDEBUG)


In that case:
- pressing "w" does not write the file on-disk anymore *if* it's
considered unnecessary (since this must be a bug).
Given how counter-intuitive this behavior is, we may reconsider this later.

- the option has been implemented with auto-save in mind

- if auto-save is off, the user is *always* asked to save before
"quitting" (as before), in the event he replies "yes", the database is
*always* written to disk, even if it was unmodified ("least-surprise
principle")


Internally a "force_save" parameter has been introduced added to
database_save() in the following commit (aef4219d27) so that we can keep
control over this database-change-autodetection feature.
Hi,
It skip database rewrites to disk when no changes actually happened.
The current issue is that there is no locking/notify mechanism for the
database file and, especially if using autosave=true, multiple abook
instances imply that "latest closed wins".
- term1 run abook
- term2 run mutt, add an email (trigger abook --add-email)
# here datafile is changed on disk with the new email
- close abook in term1
# here datafile is changed on disk, the email is lost
There are several ways to avoid/warn when this happens, but all of them
first needs that we avoid file rewrites when no change happened, that
means isolating write operations.
$ ls -i .abook/addressbook
8535 .abook/addressbook
$ abook
# press "q" to quit
$ ls -i .abook/addressbook
7943 .abook/addressbook
This is what the patch attempts to solve by marking the database "need
save" after (every) "write" operation on items.
That allowed adding a "*" indicator à la emacs to show the "db modified"
status.
The more sensible here is to ensure that *every* possible action
implying modification of the database will really triggers
db_need_save = TRUE.
If some are missing from the patch the related modifications will *not*
be saved to disk. That is to say that this patch has to be *sure*.
I need to think more about the behavior when autosave=false but IHMO
the check for "is db modified" must be activated for all *saving*
vectors, that means that pressing "w" to explicitly save the database to
disk will *not* rewrite if there is no modification.
I tried many (if not all) of the possible operations offered by the UI
and hope to not have missed any. But as shown by the hunk affecting edit.c,
database write operations are not strictly restricted to database.c.
Further testing welcome.
Next I'd see a more robust way to handle concurrent write accesses to
datafile. I can think about file-lock and inotify and I'd rather want
to avoid any need of datafile "merge".
regards
--
GPG id: 0xF41572CEBD4218F4

------------------------------------------------------------------------------
Loading...