Your IP Address is:

Next Fedora Release

Upgrade Fedora 9 to 10, with IPA-server

Hi again,

Just today I have upgraded my Fedora 9 IPA-server to the latest Fedora version, namely Fedora 10.

As reckless can one be.. just upgrade the system (with all of its db4 libraries), while the ipa instance is still running, increasing the risk of corrupting the (live) berkeley databases that are the back-end of the ldap instances.

I managed to fix the problems I encountered.. read on to read about my investigation and how I solved the problem, it might be of use if you happen to have the same symptoms . . .

Anyway, after the successful upgrade, I rebooted the VM, and it took ages to come up again.. What seemed to be the problem?? After quite a while, I managed to log into the box and I noticed that the dirsrv instance couldn’t come up. Also, the krb5kdc didn’t fire up.. The last being expected, as it depends on the dirsrv for its database.

Now.. why didn’t the dirsrv server come up.. I went searching and found the error logs in /var/log/dirsrv/slapd-<instance>/errors, containing:

[19/Feb/2009:17:12:29 +0100] – Fedora-Directory/1.1.3 B2008.289.115 starting up
[19/Feb/2009:17:12:29 +0100] – Clean up db environment and start from archive.
[19/Feb/2009:17:12:29 +0100] – libdb: Program version 4.7 doesn’t match environment version 4.6
[19/Feb/2009:17:12:29 +0100] – libdb: Program version 4.7 doesn’t match environment version 4.6
[19/Feb/2009:17:12:29 +0100] – Deleting log file: (/var/lib/dirsrv/slapd-<instance>/db/log.0000000021)
[19/Feb/2009:17:12:31 +0100] – libdb: file userRoot/id2entry.db4 has LSN 1/546977, past end of log at 1/84
[19/Feb/2009:17:12:31 +0100] – libdb: Commonly caused by moving a database from one database environment
[19/Feb/2009:17:12:31 +0100] – libdb: to another without clearing the database LSNs, or by removing all of
[19/Feb/2009:17:12:31 +0100] – libdb: the log files from a database environment
[19/Feb/2009:17:12:31 +0100] – libdb: /var/lib/dirsrv/slapd-<instance>/db/userRoot/id2entry.db4: unexpected file type or format
[19/Feb/2009:17:12:31 +0100] – dbp->open(“userRoot/id2entry.db4”) failed: Invalid argument (22)
[19/Feb/2009:17:12:31 +0100] – dblayer_instance_start fail: Invalid argument (22)
[19/Feb/2009:17:12:31 +0100] – start: Failed to start databases, err=22 Invalid argument
[19/Feb/2009:17:12:31 +0100] – Failed to allocate 10000000 byte dbcache.  Please reduce nsslapd-cache-autosize and Restart the server.
[19/Feb/2009:17:12:31 +0100] – Failed to start database plugin ldbm database
[19/Feb/2009:17:12:31 +0100] – libdb: DB->get: method not permitted before handle’s open method
[19/Feb/2009:17:12:31 +0100] – id2entry error 22
[19/Feb/2009:17:12:31 +0100] – id2entry get error 22
[19/Feb/2009:17:12:31 +0100] – libdb: DB->get: method not permitted before handle’s open method
[19/Feb/2009:17:12:31 +0100] – id2entry error 22
[19/Feb/2009:17:12:31 +0100] – id2entry get error 22
[19/Feb/2009:17:12:31 +0100] schema-compat-plugin – warning: no entries set up under cn=groups,cn=compat,dc=ipa,dc=hoekstra,dc=lan
[19/Feb/2009:17:12:31 +0100] – libdb: DB->get: method not permitted before handle’s open method
[19/Feb/2009:17:12:31 +0100] – id2entry error 22
[19/Feb/2009:17:12:31 +0100] – id2entry get error 22
[19/Feb/2009:17:12:31 +0100] – libdb: DB->get: method not permitted before handle’s open method
[19/Feb/2009:17:12:31 +0100] – id2entry error 22
[19/Feb/2009:17:12:31 +0100] – id2entry get error 22
[19/Feb/2009:17:12:31 +0100] schema-compat-plugin – warning: no entries set up under cn=users,cn=compat,dc=ipa,dc=hoekstra,dc=lan
[19/Feb/2009:17:12:31 +0100] – WARNING: ldbm instance userRoot already exists
[19/Feb/2009:17:12:31 +0100] binder-based resource limits – nsLookThroughLimit: parameter error (slapi_reslimit_register() already registered)
[19/Feb/2009:17:12:31 +0100] – start: Resource limit registration failed
[19/Feb/2009:17:12:31 +0100] – Failed to start database plugin ldbm database
[19/Feb/2009:17:12:31 +0100] – Error: Failed to resolve plugin dependencies
[19/Feb/2009:17:12:31 +0100] – Error: preoperation plugin 7-bit check is not started
[19/Feb/2009:17:12:31 +0100] – Error: accesscontrol plugin ACL Plugin is not started
[19/Feb/2009:17:12:31 +0100] – Error: preoperation plugin ACL preoperation is not started
[19/Feb/2009:17:12:31 +0100] – Error: object plugin Class of Service is not started
[19/Feb/2009:17:12:31 +0100] – Error: preoperation plugin HTTP Client is not started
[19/Feb/2009:17:12:31 +0100] – Error: preoperation plugin ipa-winsync is not started
[19/Feb/2009:17:12:31 +0100] – Error: extendedop plugin ipa_pwd_extop is not started
[19/Feb/2009:17:12:31 +0100] – Error: preoperation plugin krbPrincipalName uniqueness is not started
[19/Feb/2009:17:12:31 +0100] – Error: database plugin ldbm database is not started
[19/Feb/2009:17:12:31 +0100] – Error: object plugin Legacy Replication Plugin is not started
[19/Feb/2009:17:12:31 +0100] – Error: object plugin Multimaster Replication Plugin is not started
[19/Feb/2009:17:12:31 +0100] – Error: postoperation plugin referential integrity postoperation is not started
[19/Feb/2009:17:12:31 +0100] – Error: object plugin Roles Plugin is not started
[19/Feb/2009:17:12:31 +0100] – Error: object plugin Views is not started

Ouch!! That hurt.

subsequent starts eliminated the version mismatch, but it remained unstartable.

Now, this bug helped me out big time: https://bugzilla.redhat.com/show_bug.cgi?id=470084. It appears that the id2entry.db4 is the main information database, it can regenerate other corrupted databases. Not sure if it holds EVERYTHING, but it worked for me to repair my dirsrv.

What I basically had to do was using the tool db2ldif to extract the database from id2entry.db4, located in /var/lib/dirsrv/slapd-<Instance>/db/userRoot using this command:

dbscan -f id2entry.db4 >~/ldap.ldif

that made a dump of the database that resembled ldap data. Though, it isn’t in the LDIF format, needed to re-import the data in a fresh database. Therefore I had to edit the file. I use vim for that, as it is very powerful in search and replace strings.. here are the commands that I used and are probably useful to others as well:

Open the editor:
# vi ~/ldap.ldif
Remove all lines that contain ‘id: <number’, as that is no LDIF format
:g/^id: [0-9]/d
After this I removed all tab characters that indented the file, LDIF doesn’t like this (use tab key instead of writing out the <tab> entity)
:%s/^<tab>//
Then, in LDIF format, continuation is done by a space character at the start of the line. This wasn’t helpful for lines I had to remove. It needed the followup lines to be removed as well, so I eliminated all ‘multi-line’ statements by removing the ‘new-line-followed-by-space’ combination. Like so:
:%s/\n //
Then I went searching for the statements, as mentioned in the bug report above. I simply removed those lines.
:g/\(nsUniqueId\|creatorsName\|modifiersName\|createTimestamp\|modifyTimestamp\|parentid\|entryid\|entrydn\|numSubordinates\):/d
Finally, the LDIF file is ready, so I save and exit
:wq

With the fresh obtained LDIF file I then wanted to import the file, using the ldif2db tool, shipped with dirsrv:

/usr/lib64/dirsrv/slapd-<instance>/ldif2db -n userRoot -i ~/ldap.ldif

The result was as shown below, cheering me up, finally:

[19/Feb/2009:23:35:18 +0100] – dblayer_instance_start: pagesize: 4096, pages: 191327, procpages: 47727
[19/Feb/2009:23:35:18 +0100] – cache autosizing: import cache: 204800k
[19/Feb/2009:23:35:18 +0100] – li_import_cache_autosize: 50, import_pages: 51200, pagesize: 4096
[19/Feb/2009:23:35:18 +0100] – WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[19/Feb/2009:23:35:18 +0100] – dblayer_instance_start: pagesize: 4096, pages: 191327, procpages: 47727
[19/Feb/2009:23:35:18 +0100] – cache autosizing: import cache: 204800k
[19/Feb/2009:23:35:18 +0100] – li_import_cache_autosize: 50, import_pages: 51200, pagesize: 4096
[19/Feb/2009:23:35:18 +0100] – import userRoot: Beginning import job…
[19/Feb/2009:23:35:18 +0100] – import userRoot: Index buffering enabled with bucket size 62
[19/Feb/2009:23:35:18 +0100] – import userRoot: Processing file “/root/ldap.ldif”
[19/Feb/2009:23:35:18 +0100] – import userRoot: Finished scanning file “/root/ldap.ldif” (46 entries)
[19/Feb/2009:23:35:19 +0100] – import userRoot: Workers finished; cleaning up…
[19/Feb/2009:23:35:19 +0100] – import userRoot: Workers cleaned up.
[19/Feb/2009:23:35:19 +0100] – import userRoot: Cleaning up producer thread…
[19/Feb/2009:23:35:19 +0100] – import userRoot: Indexing complete.  Post-processing…
[19/Feb/2009:23:35:19 +0100] – import userRoot: Flushing caches…
[19/Feb/2009:23:35:19 +0100] – import userRoot: Closing files…
[19/Feb/2009:23:35:19 +0100] – All database threads now stopped
[19/Feb/2009:23:35:19 +0100] – import userRoot: Import complete.  Processed 46 entries in 1 seconds. (46.00 entries/sec)

Performing a service dirsrv start made the directory server run again, after which I could reactivate krb5kdc, finally restoring my kerberos realm.

I found it both frustrating and interesting, I learnt a lot by it.. I hope you have got good information on this if you happen to have a similar problem..