Kerberos Troubleshooting – A few approaches

It is way too long ago since my last blog post. These were or are busy weeks for me. Any way, I finally found some time to start writing a blog post about a special setup for kerberos authentication of Oracle databases. It is about configuring kerberos authentication for multiple database servers with only one active directory account and corresponding Service Priciple Names (SPN). Additionally there is an challenge, that the keytab file should only be created with ktutil directly on the DB server. Access to a Windows server and use of ktpass.exe is not possible. I did setup a nice test case on a couple of compute instances on Oracle cloud infrastructure. During the verification of the test setup I had to realise that the kerberos authentication does not work as planned. Until now it is not possible to create a keytab file with ktutil that I can use successfully with Active Directory. The same kerberos configuration with a keytab create with ktpass.exe on the AD server does work. But that’s on other story…

The aim of this blog post is to sum up a couple of troubleshooting actions I came across. Kerberos itself is around since a couple of decades. Therefore you will find various documentation, RFC, etc. But it is not always easy to recognise what is still relevant and what not. Mainly because the implementation of Kerberos at both Oracle and Microsoft is not necessarily the same or 100% MIT Kerberos compliant. The fact that there are different versions of Oracle, MS AD and Kerberos makes it even more exciting 🙂

Basics

A basic requirement for Kerberos is the network and time configuration.

  • Problem: okinit does fail with clock skew too great
  • Cause: The systems involved must be synchronous in terms of system time e.g. using a NTP service to configure date / time. If the system times differ to much you will receive this error when using okinit.
  • Solution: Configure proper system times using NTP service. Small time drifts can be covered by setting SQLNET.KERBEROS5_CLOCKSKEW=300 in sqlnet.ora
  • Problem: Miscellaneous errors due to wrong / missing network configuration.
  • Cause: Using CNAME rather A records, no DNS configuration, no revers lookkup etc
  • Solution: Configure proper DNS name resolution for database service as well MS active directory service. Each system must be able to be resolved by name or IP address. Kerberos will look for service principle names based on A records.
oracle@db:/u00/app/oracle/network/admin/ [TDB190S] cd
oracle@db:~/ [TDB190S] nslookup win2016ad.trivadislabs.com
Server:		10.0.1.4
Address:	10.0.1.4#53

Name:	win2016ad.trivadislabs.com
Address: 10.0.1.4

oracle@db:~/ [TDB190S] nslookup 10.0.1.4
4.1.0.10.in-addr.arpa	name = win2016ad.trivadislabs.com.

oracle@db:~/ [TDB190S] nslookup db
Server:		10.0.1.4
Address:	10.0.1.4#53

db.trivadislabs.com	canonical name = ol7db19.trivadislabs.com.
Name:	ol7db19.trivadislabs.com
Address: 10.0.1.6

oracle@db:~/ [TDB190S] nslookup 10.0.1.6
6.1.0.10.in-addr.arpa	name = ol7db19.trivadislabs.com.

Trace and Log Files

Kerberos Trace

As of Oracle 12c release 2 it is possible to enable kerberos tracing by setting KRB5_TRACE to a trace file. This logs the Kerberos calls in the current session.

export KRB5_TRACE=/u00/app/oracle/network/admin/kerberos.trc
oracle@db:~/ [TDB190S] okinit king

Kerberos Utilities for Linux: Version 19.0.0.0.0 - Production on 08-JUN-2020 20:54:16

Copyright (c) 1996, 2019 Oracle.  All rights reserved.

Configuration file : /u00/app/oracle/network/admin/krb5.conf.
Password for king@TRIVADISLABS.COM:

A sample output of a kerberos trace file:

oracle@db:~/ [TDB190S] head -10 /u00/app/oracle/network/admin/kerberos.trc
[5645] 1591649656.590082: Getting initial credentials for king@TRIVADISLABS.COM
[5645] 1591649656.590084: Sending unauthenticated request
[5645] 1591649656.590085: Sending request (199 bytes) to TRIVADISLABS.COM
[5645] 1591649656.590086: Resolving hostname ad.trivadislabs.com
[5645] 1591649656.590087: Sending initial UDP request to dgram 10.0.1.4:88
[5645] 1591649656.590088: Received answer (196 bytes) from dgram 10.0.1.4:88
[5645] 1591649656.590089: Sending DNS URI query for _kerberos.TRIVADISLABS.COM.
[5645] 1591649656.590090: No URI records found
[5645] 1591649656.590091: Sending DNS SRV query for _kerberos-master._udp.TRIVADISLABS.COM.
[5645] 1591649656.590092: Sending DNS SRV query for _kerberos-master._tcp.TRIVADISLABS.COM.

Oracle SQLNet tracing

For Kerberos troubleshooting with Oracle SQLNet it is helpful to disable ADR tracing. Not mandatory, but makes life a bit easier. Set DIAG_ADR_ENABLED in sqlnet.ora to OFF.

DIAG_ADR_ENABLED=OFF

Before KRB5_TRACE was available, okinit calls could only be traced with sqlnet.ora and TRACE_LEVEL_OKINIT. See also MOS note 162668.1. The parameter does not make sense when you already use KRB5_TRACE.

TRACE_LEVEL_OKINIT=SUPPORT
TRACE_DIRECTORY_OKINIT=/u00/app/oracle/network/
TRACE_FILE_OKINIT=okinit.trc

For further analysis you usually have to switch on SQLNet Tracing. Don’t even thing about setting an other level than SUPPORT (16). Kerberos calls are only available with the highest level.

TRACE_LEVEL_OKINIT=SUPPORT
TRACE_DIRECTORY_OKINIT=/u00/app/oracle/network/
TRACE_FILE_OKINIT=okinit.trc

Enable tracing for SQLNet clients:

TRACE_LEVEL_CLIENT=SUPPORT
TRACE_DIRECTORY_CLIENT= /u00/app/oracle/network/trc
TRACE_FILE_CLIENT=sqlnet_client.trc

Enable tracing for SQLNet Server:

TRACE_LEVEL_SERVER=SUPPORT
TRACE_DIRECTORY_SERVER= /u00/app/oracle/network/trc
TRACE_FILE_SERVER=sqlnet_server.trc

The errors in the trace files are not always obvious. You can find a few infos and hint in MOS note 185897.1. But most of the time there is no way around searching for the corresponding error or function call in Oracle Support or the search engine of choice.

Network Tracing

The next level is to trace the network calls. Depending on the environment you can directly use Wireshark. But it is much easier to first create a network dump via command line and to analyse it later using Wireshark. I use tcpdump on my OCI environment and download the trace file to my MacBook, where I then use Wireshark.

Get the available interfaces:

sudo tcpdump -D
1.nflog (Linux netfilter log (NFLOG) interface)
2.nfqueue (Linux netfilter queue (NFQUEUE) interface)
3.usbmon1 (USB bus number 1)
4.ens3
5.any (Pseudo-device that captures on all interfaces)
6.lo [Loopback]

Start tracing for interface ens3:

sudo tcpdump -i ens3 -s 65535 -w /tmp/network_okcreate.trc

Keep it running until while testing the kerberos authentication. As soon as done copy the trace file to the client an open it using Wireshark. The following picture does show a trace dump where the kerberos protocol has been selected.

Wireshark sample output

A part of the kerberos packet is encrypted and not visible as you can see in following picture.

Wireshark enc

Kerberos does use the service’s secret key to encrypt these messages. You can import the keytab file into Wireshark to decrypt the messages. For this purpose the keytab file must be specified in Wireshark in the preferences. Click Edit > Preferences > Protocols > KRB5.

Wireshark Preferences

You now see the message content of the packet. This is in particular useful when you have to analyse issues related to ticket size, missing groups etc.

Wireshark decrypted

Conclusion

Unfortunately my Kerberos problem is still not solved. Nevertheless I did get the opportunity to practice a couple of Kerberos tracing methods. The introduction of KRB5_TRACE did simplify tracing a bit, but in most case you still have to use SQLNet or network tracing to find the root cause of you Kerberos problem. A direct solution is unfortunately not always found with tracing. At least you have all the relevant information to search My Oracle Support, open a service request or try your luck at googling for a solution.

Good luck with your Kerberos setup. 😉

References

Some links related to this blog post:

  • Kerberos Troubleshooting Guide [185897.1]
  • Master Note For Kerberos Authentication [1375853.1]
  • How to Trace Unix System Calls [110888.1]
  • Tracing Okinit [162668.1]
  • How to Enable Oracle SQL*Net Client, Server, Listener, Kerberos and External procedure Tracing from Net Manager [395525.1]
  • Requesting kerberos TGT with OKINT errors with okinit: Clock skew too great in 12.1.0.2 [2312008.1]