Google Cloud is a sad mess

James Gragg
7 min readFeb 5, 2021

--

As a technology consultant that has worked with hundreds of businesses in every state between astounding success and total disrepair, I have never in my entire career felt compelled to make a public issue out of poor corporate performance, but there are lessons and warnings here for other businesses that are too valuable to not share.

While prototyping a new mobile app business, we first started using the absolutely wonderful set of app tools in Google Firebase.

After having such a great experience with Firebase, when we needed to add some new infrastructure pieces during development, we figured we might as well try Google Cloud instead of AWS.

You quickly learn that Firebase and Google Cloud are kind of the same thing. For example, Firebase Functions are actually Google Cloud Functions and can be accessed using either system using completely different platforms and totally different interfaces.

Essentially, Firebase takes several Google Cloud components and puts a nice wrapper around it, providing an unparalleled suite of tools for app development.

Firebase and Google Cloud have some wonderful features that are spectacularly developer friendly compared to AWS and Azure. This is where Google Cloud really shines.

Where Google Cloud does not shine is poor reliability, poor performance and poor support across several business divisions and critical infrastructure components to a degree that is absolutely stunning.

A sampling of our ongoing issues in a basic Google Cloud project

Internal database storage (in managed databases!) can run rampant, causing you to pay dramatically more than you should

We are currently being billed for 1000% (!) of our primary database size because the internal log purging mechanism doesn’t work correctly and transaction logs stack up for far longer than they should.

“Engineering is working on a fix.” No ETA.

Shockingly, this issue wasn’t found until we submitted a support ticket. This is absolutely jaw-dropping.

Similarly, managed database replica storage is more than it should be, also causing dramatic over-billing

We’re currently being billed 600% of the size of our replica database, presumably because transaction logs aren’t properly being cleared from it. There is no ETA. I’m not sure it’s even being worked on.

This issue was also not found until we submitted our support ticket.

I’m not sure how many total customers this impacts, but it has to be something like “a lot” because we are doing nothing special at all.

Customer-facing HTTP servers randomly stop working

Google Cloud Run serves customer-facing HTTP requests from Docker containers. These randomly stop working due to an internal issue with GRPC, Google’s remote procedure call framework. This issue was reported almost a year ago.

“Several of our customers were facing similar issues. Since its a known issue, the Engineers are currently working on a fix.”

Core services are nearing non-functional in basic scenarios

It takes literally 6 seconds for a Google Cloud Function to return a single row from a Google Cloud database through Google Cloud SDKs due to “cold starts.” Again, these are things designed to function together.

There is no fix and this has been a known issue for years.

Poor database performance during high write volume

This isn’t something you find on AWS, because RDS and Aurora use smarter default database engine tuning that GC doesn’t (and also doesn’t allow you to change.) We haven’t bothered reaching out about this yet.

Getting support shouldn’t be harder than the challenges you’re trying to get support for

The nature and art of engineering is a complex beast, and we fully understand that. You have to work through these things, and they never end.

Unfortunately, Google even makes getting support a bafflingly steep task, thoroughly riddled with a completely different set of issues.

Getting paid support is an absolute nightmare — in a hilarious, dystopian twist, we’ve had to desperately and frantically search for a way to give Google money so we can tell them about their own issues.

You literally can’t buy support on an account. You have to submit a request to create a Google Workspace organization — and you can’t convert your existing account. This means you need to sign up as a Google organization, using a different account, and be manually approved.

So we have account A, with our Google Cloud project, and now we have account B, after it was approved, with an an organization and no Google Cloud project.

So how is the Google Cloud project in account A supposed to buy the support plan that requires the organization that is in account B?

— Account A cannot be invited to the organization because “the account already exists.”

— The Google Cloud project cannot be migrated to the organization because the organization belongs to account B.

— The Google Cloud project cannot be migrated to account B.

So what are you supposed to do here? After six different billing support cases, including one where I drew the problem out on a whiteboard, we were finally directed to the right place — Google Workspace Support — which told us about an obscure tool they had called Invite Unmanaged Users which could fix this. Great!

We were finally able to get account A into account B’s organization, migrate all of the projects to the organization, and then buy a support plan! Hooray!

We set up users and permissions with our support plan (which is an entirely different can of worms) and then open a case for some of our database issues. Nice! After some back and forth with a nice fellow, they eventually escalated our issue to an internal team.

18 days later, internal engineering has confirmed multiple issues but none of the issues have an ETA.

We asked support if we could receive a direct contact or account manager to help facilitate handling some of these issues. They said that, yes, these things absolutely exist (i.e. a TAM, or Technical Account Manager), and to talk to sales and pointed us to a form to fill out. Great!

Surprisingly, the sales team was the worst interaction by far

That’s not something you expect. After outlining what problems we were trying to solve, we were told, literally verbatim: “We don’t just provide account managers to people.”

We had sort of figured that out since we didn’t have one.

We then asked “how much do we have to pay or have in spend for an account manager?” and they said “I really wouldn’t recommend that. You should use role-based support instead.” (Role-based support referring to the support plan we already purchased.)

When we informed them that, yes, we already had that, they just repeated themselves. Lovely.

So is this supposed to be an encouragement to grow your business and spend with Google Cloud?

Suffice it to say, the “Customer Growth Expert” (the title of these employees) had the opposite effect.

The Google payments system appears to be broken and nobody seems to know how it works or how to fix it

One of the things we needed to do with our Google account was to add a payment method, so we could make and receive payments.

We’ll just add our bank account. Or will we?

Every attempt to add our bank account (just a standard US bank account!) fails with an unknown error. It has been reported many times online with no apparent fix.

We have spent more than two weeks (!) with Google support trying to add a bank account to our account. We still can’t.

After a countless amount of back and forth with Google support, and Google support with internal account specialists, nobody seems to have a clue what is going on or how to fix it.

A warning

How is it possible that so many different business divisions — different tech infrastructure products, different engineering and product teams and different support teams — can fail so dramatically and simultaneously at basic reliability?

I don’t know.

But I do know that our usage pattern in Google Cloud is basic. Running any kind of technical operation will not get much less complex than what we’re doing.

I’ve shared some of this feedback with Google Cloud support, and this was their response:

“Thanks for your honest and transparent feedback. I think it’s really pertinement and I agree with you.

I passed it over to the Product team to make sure your feedback is properly addressed.”

As nice as that is to hear, this feels like yet another piece of feedback sent via paper airplane into a bottomless void.

As an avid AWS user across dozens of businesses and AWS accounts, I know that you will absolutely run into internal AWS issues as well. I’ve personally discovered internal breaking bugs in ELB, RDS and Aurora to name a few (and getting them resolved is even more painful than Google Cloud.) But — and this is important — these are almost always during complex edge cases.

Given the fact that we’re running into these issues with GC at a small, basic scale, it is absolutely daunting to imagine what would happen at a larger, more complex one. It’s nauseating to think about; you’re basically completely on your own.

Migrating to AWS would be a painful process, not necessarily due to the technical implications, but because Google has nailed the developer-friendliness of their platform and that’s been amiss anywhere else.

It really sucks to have to choose between ease of use, performance and reliability, and be completely left out to dry as a small business no matter what you do when you need help, but that seems to be where the cloud game is at right now unless you’ve got a seven-figure spend (this is the spend required at AWS to be taken seriously; no idea what the secret amount is at GC, since they wouldn’t tell us.)

--

--