Welcome to my sixth update as Ruby Central’s security engineer in residence, sponsored by AWS.
My goal is to write a short update every week, chronicling what I’ve been working on, and reminding myself that I was, in fact, productive.
Fixing a common source of ONCALL pages
As I mentioned last week, I had found that by far our most expensive query was for reverse dependencies of a gem. I shipped my fix last weekend, and it sure made a difference.
Fixing N+1 Queries
Unfortunately, most of my week was spent on RubyGems.org operational issues. I had noticed that many of our slowest endpoints had a very large number of queries being executed, and I spent a bunch of time digging into root causes in DataDog to figure out why. It turned out we had a spate of N+1 queries, and heavy automated usage of the site’s API (along with scraping of the HTML pages) caused heavy load on those endpoints.
As a result, our postgres database got overloaded several times during the week, leading to multiple members of the oncall rotation (including yours truly) to be paged over and over. It wasn’t fun.
Fortunately, the Rails ecosystem has several tools to help narrow in on N+1 queries!
After slapping strict_loading
on several queries and adding in includes
and preload
calls, I brought out the big guns to find the remaining offenders.
Setting up the wonderful prosopite gem led to several more fixes. After a hurried emergency deploy on Thursday afternoon, we saw an immediate improvement across the board and the site once again appears to be stable!
Top requests over the past 24 hours
Average time spent per request on Api::V1::VersionsController#show
Problematic N+1 query (getting gem version download count)
Finally, it appears that there are some users who are scraping every gem & version page/endpoint.
We would strongly reccomend that researchers & other automated platforms use the data dumps we provide if possible,
and make decisions on what endpoints to call to refresh data based upon changes in the /versions
file, which is statically generated
and thus causes no load on the rails app or database.
Sigstore
Few updates this week, due to the aformentioned operational work. I did make progress on protobuf compliance, though!