Multi-tenant SAML in an afternoon (using SSOReady)

Reading Time: 16 minutes

tldr; I took SSOReady up on their marketing tagline of “SAML in an afternoon”. Overall, it was a positive experience and despite not quite having a production-ready deployment in an afternoon, I think their tagline is a fair approximation of effort.

Introduction

Enterprise SSO logins are a weird dictomony. Implementing something like OIDC or SAML isn’t necessarily hard but there is always a debate of whether to implement it internally or use a vendor. There never feels like there is a simple answer.

Your framework might support it, but you probably need to know more about SAML than you want to know. If you already started with an IdP, congratulations, you might be in luck. If it’s a vendored-IdP, for a not-so-small fee, you can add OIDC or SAML on a per-customer basis. The industry average seems to be between $50-$100 per customer.

If you started with the username/password implementation that came with your framework like most startups do, then now you have a decision to make. Do you tag on a SAML implementation on your application or make the full switch to an IdP and pay the cost of migration (and possibly throwing away a bunch of code).

Step 1: Decide if an IdP is right for you.

If you haven’t started your application, I’ll just say, yes. It’s likely a great option if you can afford the upfront cost. It’ll save you from a later migration and from having to implement a bunch of user onboarding stuff yourself. For a long time, your only real OSS option here was Keycloak, which historically struggled in multi-tenant environments. However, in recent years, a number of good OSS options have really come to maturity. A full analysis of the options is out of scope here for today, but take a look at Zitadel, Keycloak, and Authentik. Most have hosted options for reasonable prices.

If you already have a working application, thats where the decision gets a litter bit harder. At Masset, we debated for a long time whether we should just bite the bullet and switch to an IdP or continue with the framework provided system + SAML. In the end, here are a few things we discussed that you’ll want to to consider:

  1. User migration – unless you want to maintain 2 user databases, you’ll want to migrate your users to your IdP
  2. Two user databases – JK, after your migration, you still might need two user databases if you need quick access to user information all over your app. Think about pages that show histories, owners, searching, etc.
  3. Customized user flows – If you have branded user flows, make sure your IdP supports the right customization. Things like branded login, look and feel, pretty emails, etc might or might not be supported. You might need to be willing to just use what they give.
  4. Custom login requirements – Do you have custom login requirements per customer? Most IdP’s will make this much much easier than implementing it yourself.
  5. Trust – Do you trust yourself or a third party’s implementation of user management? Answers here will depend on your company and how jaded the tech industry has made you.
  6. Cost – Last but not least, cost. I’ll just come out and say it: IMHO, what most vendors charge for being IdP is outrageous. Charging by MAU in an IdP? Really? We all know that the marginal cost for me to have a user in your system is essentially 0. You really gonna charge me 10 cents a month for a row in your database?

In the end, Masset decided that we will approach this in two stages. First, let’s get SSO working for urgent customer onboarding. Then, let’s make a full switch to a self-hosted OSS IdP in the future when we have enough company traction that the migration justifies the cost.

That’ll also gives a few of the OSS IdPs a few more years to mature to multi-tenant B2B SaaS use cases. It’s surprising how immature that is for how popular a model.

So what to do instead? Enter SSOReady…

SSOReady isn’t an IdP

When I first stumbled on SSOReady on HackerNews, my first thought was simply, “Finally!”. I’ve addressed this same transition at multiple startups in the past, and multiple times we’ve ended up paying a crazy amount to vendors (cough, Auth0, Okta, etc, cough) just to act as a SAML/OIDC proxy. We used none of the other features. And I suspect a lot of other companies do the same.

SSOReady offered a great in-between option. They aren’t an IdP. They are simply a way to add SAML login to your application. They handle all the SAML logic for you and simply provide a callback to your application when a user is properly authenticated. You don’t change your user system, just add a new login path. A new key for the door already in your system, if you will.

Another great benefit: their pricing is reasonable. No per-user nonsense. Just a flat fee that is reasonably close to what it would take me to run it myself in AWS. Oh, and it’s MIT-licensed. So if pricing ever does become outrageous, we have an exit strategy to mitigate costs.

Unfortunately, SSOReady recently took down their public pricing. So take that as you will. However, a brief 15-minute phone call was all it took to get some solidified pricing at a fraction of what I paid the big players in the past.

“SAML in an Afternoon”

SSOReady’s early tagline was “SAML in an Afternoon”. It looks like its been removed from a few of their marketing materials, but it still shows up in their documentation a few places.

I never hold people too harshly to their marketing materials. But I thought it would be a fun exercise to see if I could actually hit that mark. So I carved out a Friday afternoon free of meetings and decided to give it a go.

I started from scratch, with mostly* no prior knowledge of how their system worked. After spending 30 minutes iterating between asking ChatGPT how to setup MacOS to take a screenshot every 10s and writing random bash scripts, we were ready to go. Screenshots every 10 seconds would provide a journal of what was going on so that I could work the afternoon without interruptions for writing my thoughts.

And so I got started!

* This isn’t entirely true as I had read a few of their pages to get an understanding that they weren’t an IdP and how they fit into auth flows. But it’s pretty close.

SAML Dev Diary

Below is the diary of implementation. Italicized items are thoughts I was having as I was executing. The rest is a description of what was happening.

[1:30 PM] I’ve already wasted 30 minutes of this experiment trying to get automated screenshots working. So SSOReady is already at a disadvantage because I already have Friday night plans at 5 PM that I can’t miss!

[1:33 PM] Reading through the SAML Quickstart page.

[1:35 PM] Reading through the Integration SAML with your Login UI page. I appreciate that this page offers multiple different approaches. I think we’ll go will Option 3 for our use case. Based on that, first things first, we need a “discovery” endpoint that take an email address as input and gives us login settings as output.

[1:42 PM] Start on the /discovery endpoint. Adding backend code first. POST request with a simple body of the current username (may or may not be a full email).

Good thing we already have a way of doing global rate limits by IP address. Exposing this endpoint could get problematic… we’ll need to do a security review on it before shipping it. And we’ll need to fine-tune these limits.

[1:43 PM] After writing the parsing logic in the controller… Alright, let’s do this right. Add Service and Repository for cleaner code.

[1:48 PM] We already have domain claims, so it’s nice we don’t have to add that. That’ll provide the mechanism by which we can go from email -> domain -> company -> login settings. We can’t just go from email -> company because the user may not be provisioned (JIT). Does shortcutting in cases where the user does exist buy us anything? I’ll have to think about that later…

[1:53 PM] Augmented the company settings to allow configuration of permitted login types. This way I don’t have to call SSOReady just to check if they should be using SAML. I can just check the database.

[2:00 PM] Discovery endpoint is code-complete. Let’s start the app! Ah crap, build failed. Forgot to update a few instances of the auth configuration. Fixed and running. Endpoint takes email as input and outputs the following: 1. permitted login types, and 2. the SAML url redirect, if any. Alright, let’s move on to the frontend.

[2:02 PM] First things first, a new custom react-query hook to get the data.

[2:09 PM] Alright, let’s trigger the hook based on what the user has typed in the login. Ugggghh, react-hook-form should be so much simpler for things like this. Just feels so heavy. How do I get the current value again?

[2:11 PM] Ask ChatGPT how to get the current RHF value so I can pass it to my custom hook. I think I know how to do this, but I know getValue() doesn’t work the way I think because I’ve gotten that wrong multiple times in the past.

Alright, it’s watch(), let’s wire that up.

[2:16 PM] All wired up. Let’s add temporary UI so we can see what options are discovered when typing emails.

Huh. 500 error. What’s going on?

[2:20 PM] Oh, domain claims have row-level-security enforced that scopes them to the current company. No current company when no one is logged in. Update RLS policy to allow global searches by domain for system-users.

[2:24 PM] Login settings still not showing on the test UI. Endpoint seems to be returning correctly. Did I map my types wrong?

[2:26 PM] Remember that { foo.bar } will render to nothing in React if bar is a boolean. Let’s just cast these to strings with + ''. UI now shows settings correctly.

[2:30 PM] Create some test companies and users so that we can see differences across user-entered values. Test with [email protected] and [email protected] work correctly. Any value @company1.com shows SAML and any value @company2.com shows username/password. Update the UI to be a bit prettier without the debug information.

I’ll come back and make this all prettier with animations later, before I ship it. But knowing my experience with framer, that’ll take the rest of my afternoon.

[2:40 PM] Oh wait, I guess we’re not all the way done with that endpoint. It needs to return the SAML redirect url as well.

Time to integrate with SSOReady! We need to get the redirect URL for a given company. Sign up for SSOReady account. Create local environment.

What are these callbacks? I think this is where we land when auth succeeds. Oh, that’s probably the second endpoint. I don’t have that yet. I’ll come back and fill this in later.

Looks like SSOReady Organizations == Masset Tenants (companies).

Ahh, SSOReady Organizations have a distinct property called externalOrganizationId… instead of me having to throw it in the org name or random attribute or something. I can just use that to tie the two systems together. I guess I could store their organization id on my company table, but for some reason, it feels simpler this way. These guys really do seem to get this specific use case. Thank heavens.

Created an API Key for use in our application.

[2:48 PM] Alright, configuration on the SSOReady side is done, let’s add their library to our app so we can make calls!

Darn, no JDK SDK. Guess we’ll do it by hand.

Create a Retrofit client for the SSOReady API. Only need a couple of endpoints, so not too bad. Might as well add the code for redeem as well, as we’re likely going to need that soon too.

And wire it up to an OkHttpClient…

[3:10 PM] Let’s externalize the configuration to Spring properties while we’re at it… and update the terraform files to ensure it won’t be missing from our live environments.

[3:16 PM] Alright, now we update the discovery endpoint code to call the SSOReady client to get the redirect url if a user email is supposed to use SAML login. We lookup the company by domain, check if SAML is enabled in the database, then call the client with externalOrganizationId set as the tenant id.

Annnddd… of course it errors out.

Huh. “application/json; charset=UTF8? sub-format not supported.” That’s an odd error. Sub-formats are pretty common practice. I’ll have to reach out to the SSOReady guys to let them know about that one.

In the meantime, update the client to only send “application/json” instead of the default subtypes.

[3:21 PM] Test fails again. Whoops. Forgot to add a SAML connection for this organization. Back to the SSOReady UI to do it. Add SAML connection to mocksaml.com.

Ugh. Failed again. But this one is from mocksaml.com. This worked fine when I tested with IdPs. I wonder if there is some missing piece with SSOReady.

Do some Googling to figure out what’s going on. Nothing concrete but a few Github issues on MockSaml about people experiencing the same thing.

Well, I guess we can sign up for a trial with Okta to act as the SAML provider. That’s what our customers use as their Workforce IdP anyway, so might as well test with that instead.

[3:30 PM] Sign up for Okta trial. Get locked out of account, have to install Okta Verify, finally get access.

Uggggggggghhhhh.

[3:41 PM] Alright, let’s set up a SAML connection in Okta.

Nice, SSOReady has documentation specifically for that.

Follow that documentation.

[3:52 PM] Default Okta settings prevent logins without Okta Verify. Sigh.

Probably good to have that enabled by default, but I don’t want to have to add 2FA for a whole bunch of test accounts.

Muck around with policies and flows until I get it finally disabled.

[3:57 PM] Login successful! The SAML process succeeded, but I landed on a 404 page. What’s going on?

Ohhh, yah, that callback needs to be filled in. That’s not implemented yet. I want the callback to go to my API so that I can register the user, not the frontend.

[4:01 PM] New /callback endpoint in the backend. Needs to be unauthenticated as well. Takes saml_access_code as parameter. Passes the access code to the SSOReady client to redeem it. If redemption succeeds, user needs to be authenticated.

[4:22 PM] If the redemption succeeds, we need to map the user to the users in our database.

This is where I’ll tie in JIT provisioning at some point. We’ll need to make that configureable, because some customers will want SCIM (also offered by SSOReady) and others will want JIT.

[4:28 PM] Wiring up Spring Security authentication using manual code path for SecurityContext.

This is the wrong place to do this… I’ll need to move this to an AuthenticationProvider and create a custom authentication token. It’s fine for a POC, but not when I go live.

[4:35 PM] The user is authenticated, but the associated company is empty. What’s going on?

[4:49 PM] Friends don’t let friends use global extension functions. A diatribe for another time, but improperly named global extension functions can smell a lot like prototype pollution.

An extension function named “String.toUUID()” was supposed to be scoped to a specific class but was not marked as private. It did not, in fact parse standard UUID strings into UUID but custom hex formats into UUID.

Once switched to correct parsing, company was populated correctly.

[4:51 PM] SUCCESS! User authenticated using SSOReady SAML and logged in to Masset.

Body of Work

In the end, it only took us two endpoints to get this working, /discovery and /callback. And a bunch of UI updates to our login page. Honestly, a pretty small chunk of work. We did get it working within an afternoon. So that held water.

The /discovery endpoint takes as an input the username and returns back authentication settings, including the saml redirect url if needed. Internally, there is a bit of nuance with this endpoint in that it could expose sensitive information if not implemented correctly, so make sure to handle that with care.

The /callback endpoint takes the saml access code from SSOReady and redeems it back to their service to get authentication information. Then it authenticates the user in Masset. Again, not a hard endpoint, but a possibles sensitive one that should be addressed with care.

Remaining Items

In all fairness, I did cut a lot of corners when doing this Proof of Concept implementation. Even though the happy path is complete, I think it’s fair to call out the pieces that still need doing:

  1. Lock down of logins based on configured types. (if company is configured as SAML, username/password shouldn’t work)
  2. EULA/Privacy policy acceptance. Since JIT is possible, the UI needs to add a step to force those users to accept the EULA/Privacy policies. Thanks SOC2!
  3. Cleaned up UI. I got it to functional, not beautiful.
  4. Error pages. Need to add some error states/pages
  5. Add state for better tracking. SSOReady supports a state parameter. We should use it to better lock things down.
  6. Security Review. A few pieces here need to be reviewed by our security team to ensure we didn’t open any accidental holes.

All in, we’re probably talking another 8-12 hours of work to button everything up. So while that is longer than an afternoon, it’s still a fairly great turnaround time.

Conclusion

At the end of the day, this was a fun exercise if nothing else. It was fun to track my own development process and see how I think through a solution and where I drew the line on “follow up on this later” items.

I think that the “SAML Proxy” concept that SSOReady advocates can have significant implementation benefits over a full IdP migration. I was pretty impressed with their documentation and their APIs/Web app was simple and just worked.

All and all, not a bad afternoon’s worth of work!

Categories: