Prologue

Last week, we were fixing some of the critical bugs in our internal application. Given that we are nearing our 1.0 release (this is a greenfield project for us!), some of us are polishing the application and removing any bugs we find. Among the bugs we fixed, we also fixed a notable one which was one of the most longstanding but subtle bugs we have seen so far, and it was related to Session Expiry.

The problem was straightforward: a user session kept expiring after 30 minutes. We were using Next.js with Keycloak as our Auth Provider. Keycloak was configured with a 30-minute access token and a 1-day refresh token. However, user sessions were lost after 30 minutes, leading us to suspect that the token refresh was not occurring. We were using NextAuth v4 to manage user logins and sessions.

Debugging in Local

To reproduce and debug the problem in our local environment, we decided to expedite the process. Since we didn’t want to wait for 30 minutes or 1 day to have the token expire, we reduced the time in our local Keycloak environment down to 30 seconds for Access Token and 1 minute for Refresh Token. This meant that the access token would expire after 30 seconds, necessitating a refresh using the refresh token. We could make API requests with the access token while it was valid. After 1 minute, both tokens expired, redirecting us to the login page.

With this thought process in place, we ran our application locally, and surprisingly, it worked flawlessly. We were able to log in, use the token for API requests, and refresh the token after the access token expireduntil the refresh token expired.

We even extended the refresh token expiration time to 2 minutes so that the system would attempt to refresh the access token three times—after 30 seconds, 60 seconds, and 90 seconds. After 120 seconds (2 mins), both tokens should be expired. We ran it again, and it still worked as expected. The tokens refreshed three times as planned, and both tokens expired after 2 minutes. Protected endpoint calls continued functioning during those two minutes.

We realised that the issue had nothing to do with expiration times or the refresh token action itself. Yes, the token refresh wasn’t happening, but that wasn’t the root problem. There was something peculiar about our own configuration of NextAuth.

Third Act

Essentially, we had a route at /api/auth/[...nextauth], which utilised NextAuth’s route handlers. These handlers operate using an object that primarily contains external AuthenticationProvider configurations and callbacks for managing JWTs and Sessions.

Among these configurations, we discovered an interesting line:

// api/auth/[...nextauth]/route.ts
const authOptions: AuthOptions = {
  providers: [
    KeycloakProvider({
      clientId: process.env.KEYCLOAK_CLIENT_ID,
      clientSecret: process.env.KEYCLOAK_CLIENT_SECRET,
      issuer: process.env.KEYCLOAK_ISSUER,
    }),
  ],
  session: {
    strategy: "jwt",
    maxAge: 30 * 60, // <<<< INTERESTING LINE
  },
  callbacks: {
     ....
  },
};

const handler = NextAuth(authOptions);
export { handler as GET, handler as POST };

This line essentially instructed NextAuth to limit the max-age of the session to 30 minutes. Why did we need to configure session properties in NextAuth in addition to those in Keycloak? The reason is that NextAuth uses its own session cookie to manage its sessions. We realised that we were storing the access token as part of this session cookie during the callback, and our jwt callback logic relied on that access token within the session cookie for refreshing tokens.

We understood the problem then: NextAuth’s session expires after 30 minutes and does not consider our Keycloak tokens beyond that point. In other words, NextAuth’s session expires before Keycloak’s session does. When we attempted to reproduce it locally without changing NextAuth’s session max-age (which remained set at 30 minutes), there was sufficient time for a refresh since NextAuth’s session lasted for 30 minutes while Keycloak’s session expired first.

Finally, we replicated the problem locally by setting the session max-age to 30 seconds (the same duration we specified for Keycloak’s access token during earlier debugging).

So what’s the Fix?

Another interesting bit is that the default max-age for a NextAuth session is actually 30 days! Somehow, we had reduced it to just 30 minutes! This means that removing the max-age altogether should resolve the issue—and indeed it did.

We tested this change locally by adjusting the session max-age from 30 seconds to 2 minutes (the same duration specified for Keycloak’s refresh token). Since the refresh token expires after 1 day, users need to log in again with Keycloak even if their NextAuth session remains active. Therefore, we ultimately settled on using a timespan of 1 day for both sessions.

Epilogue

The fix alone wasn’t sufficient to completely eliminate the bug; we also had to remove refetchInterval from the SessionProvider component to prevent automatic session polling.

This one bug led us to understand how NextAuth operates. We were unaware that NextAuth has its own “session,” which we mistakenly confused with Keycloak’s session. Without encountering this bug, I’m unsure if we would have grasped how NextAuth functions.

This may be just the first of many blogs discussing our learnings from fixing such subtle bugs!

software is full of bugs? We'll fix it next release - Jack Dangermon -  quickmeme

See you all until we find another interesting bug🐞.