Incidents | Buttondown Incidents reported on status page for Buttondown https://status.buttondown.com/ en External event backlog recovered https://status.buttondown.com/ Wed, 27 Aug 2025 03:12:13 +0000 https://status.buttondown.com/#bb8be50e2f0864fbf93efa44eaeac7eefcf2ee8a61d9a8f9f80542fcca62ede8 External event backlog recovered External event backlog went down https://status.buttondown.com/ Wed, 27 Aug 2025 03:00:56 +0000 https://status.buttondown.com/#bb8be50e2f0864fbf93efa44eaeac7eefcf2ee8a61d9a8f9f80542fcca62ede8 External event backlog went down Dashboard is unavailable https://status.buttondown.com/incident/710664 Fri, 22 Aug 2025 17:10:00 -0000 https://status.buttondown.com/incident/710664#383c4890852310650fce0f8e21b9dea2ae95178a38442bf58b29d96c96a0f28a The team has identified the issue and reverted the commit. The Dashboard is back up and working as expected. Dashboard is unavailable https://status.buttondown.com/incident/710664 Fri, 22 Aug 2025 16:40:00 -0000 https://status.buttondown.com/incident/710664#22001581118154589aa5f276696c078d8a33d734cc6e15d0f4d32fb9d560e782 We're currently investigating reports that the Dashboard is unavailable to some customers. p95 response time recovered https://status.buttondown.com/ Fri, 22 Aug 2025 13:30:56 +0000 https://status.buttondown.com/#6253629511d4a13ee64c6d460f45e64a8415822a99cf0282fc1e8d33b5e02dab p95 response time recovered buttondown.com/applied-cartography recovered https://status.buttondown.com/ Fri, 22 Aug 2025 13:24:12 +0000 https://status.buttondown.com/#5b2f8eec844dc13badfe676b02a6481ddfb64de51014993ea20e815b00bf4971 buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Fri, 22 Aug 2025 13:16:10 +0000 https://status.buttondown.com/#5b2f8eec844dc13badfe676b02a6481ddfb64de51014993ea20e815b00bf4971 buttondown.com/applied-cartography went down p95 response time went down https://status.buttondown.com/ Fri, 22 Aug 2025 13:15:56 +0000 https://status.buttondown.com/#6253629511d4a13ee64c6d460f45e64a8415822a99cf0282fc1e8d33b5e02dab p95 response time went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Fri, 22 Aug 2025 11:00:53 +0000 https://status.buttondown.com/#3ed69a19ecd07a8b22cdb4aad0c3e624f0af218520d15ad34514173a3075beef buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Fri, 22 Aug 2025 10:58:09 +0000 https://status.buttondown.com/#3ed69a19ecd07a8b22cdb4aad0c3e624f0af218520d15ad34514173a3075beef buttondown.com/applied-cartography went down External event backlog recovered https://status.buttondown.com/ Wed, 20 Aug 2025 19:00:37 +0000 https://status.buttondown.com/#b9132c0f865e162f5692fb216041a80c59b0bddda1a48ab84adbb5ae37f537a6 External event backlog recovered External event backlog went down https://status.buttondown.com/ Wed, 20 Aug 2025 18:57:01 +0000 https://status.buttondown.com/#b9132c0f865e162f5692fb216041a80c59b0bddda1a48ab84adbb5ae37f537a6 External event backlog went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Wed, 20 Aug 2025 15:00:22 +0000 https://status.buttondown.com/#9e81af1ac2ead1da902cbc9803abcee4d10b5b55ceeab3929aaee90009451441 buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Wed, 20 Aug 2025 14:54:33 +0000 https://status.buttondown.com/#9e81af1ac2ead1da902cbc9803abcee4d10b5b55ceeab3929aaee90009451441 buttondown.com/applied-cartography went down External event backlog recovered https://status.buttondown.com/ Tue, 19 Aug 2025 23:01:08 +0000 https://status.buttondown.com/#ed1808b8c252efc057fd88b992ee6cc63131857de015357b32afeead00acb84c External event backlog recovered External event backlog went down https://status.buttondown.com/ Tue, 19 Aug 2025 22:01:24 +0000 https://status.buttondown.com/#ed1808b8c252efc057fd88b992ee6cc63131857de015357b32afeead00acb84c External event backlog went down External event backlog recovered https://status.buttondown.com/ Tue, 19 Aug 2025 21:01:18 +0000 https://status.buttondown.com/#5321bac86905263b653e0037764e965a9c4f69003d9b27eed14003a6612adb36 External event backlog recovered External event backlog went down https://status.buttondown.com/ Tue, 19 Aug 2025 20:01:13 +0000 https://status.buttondown.com/#5321bac86905263b653e0037764e965a9c4f69003d9b27eed14003a6612adb36 External event backlog went down Emails were not sending from the editor https://status.buttondown.com/incident/708185 Mon, 18 Aug 2025 05:30:00 -0000 https://status.buttondown.com/incident/708185#0826b45e3e4690cb90fbfeaac3a8c03b6c6912dc981510cf67085f5e4f7dcd19 Resolved. Backlog recovered https://status.buttondown.com/ Mon, 18 Aug 2025 00:21:10 +0000 https://status.buttondown.com/#da58b215aacc738829c0f4796f10324d82455f29056fe70b076a3276b175ae98 Backlog recovered Backlog went down https://status.buttondown.com/ Mon, 18 Aug 2025 00:11:07 +0000 https://status.buttondown.com/#da58b215aacc738829c0f4796f10324d82455f29056fe70b076a3276b175ae98 Backlog went down Emails were not sending from the editor https://status.buttondown.com/incident/708185 Sun, 17 Aug 2025 18:00:00 -0000 https://status.buttondown.com/incident/708185#79a4bf1b5fa24308cc4930be48c7df6be0206e6793d1e096ff36d2825ecaa894 On Sunday August 17th, 2025, between 2:15pm and 1:30am Eastern Time, sending emails through Buttondown's markdown and fancy mode editors often failed. This did not affect emails sent through the Buttondown CLI, Automations, RSS-to-Email, or the Buttondown API. This manifested as a perpetual "Loading…" screen after pressing "Publish," and the email not sending. This was due to a bug fix that was rolled out in the afternoon. Changes are not frequently made to Buttondown's code on weekends, but this change was degrading some customers' ability to write in the editor, so we verified that the fix was valid and rolled out the change. However, this change introduced a bug: immediately after an email's status was changed to `about_to_send` by an author pressing "Publish," our editor would mistakenly detect that it should be set back to `draft`. Therefore, most emails were put into `about_to_send` and then subsequently pulled back to `draft`. Unfortunately, because this change was made on the weekend and our support operates Monday-Friday, we did not notice customers flagging the issue to support until the evening. We identified the issue and rolled out a fix around 1:30am Eastern Time. We take such incidents, especially to major core functionality, extremely seriously. We apologize for anyone who ran into this issue, especially with time-sensitive emails that were delayed. Going forward, we're preventing an incident like this by adding a monitor that pages the team if an unusually low number of emails has been sent recently, and by adding end-to-end tests that automatically test email sending from the editor. If you have any further questions, we'd be happy to answer: support@buttondown.com External event backlog recovered https://status.buttondown.com/ Fri, 15 Aug 2025 23:01:05 +0000 https://status.buttondown.com/#63c38c1cceeaa9a3f6f621043600820ce4df9dadf44d0923712289e478b29c08 External event backlog recovered External event backlog went down https://status.buttondown.com/ Fri, 15 Aug 2025 22:01:11 +0000 https://status.buttondown.com/#63c38c1cceeaa9a3f6f621043600820ce4df9dadf44d0923712289e478b29c08 External event backlog went down External event backlog recovered https://status.buttondown.com/ Tue, 12 Aug 2025 14:01:15 +0000 https://status.buttondown.com/#98d32b5ace71d46a65d7574486ec2d97d92df2409ccd0dbfade132c64247ffcd External event backlog recovered External event backlog went down https://status.buttondown.com/ Tue, 12 Aug 2025 12:01:15 +0000 https://status.buttondown.com/#98d32b5ace71d46a65d7574486ec2d97d92df2409ccd0dbfade132c64247ffcd External event backlog went down External event backlog recovered https://status.buttondown.com/ Sun, 10 Aug 2025 17:00:53 +0000 https://status.buttondown.com/#0e51f7ecce9d1caabb51f57d4b9199d719bc147bb8d832c4a9e6d258214221fe External event backlog recovered External event backlog went down https://status.buttondown.com/ Sun, 10 Aug 2025 16:01:00 +0000 https://status.buttondown.com/#0e51f7ecce9d1caabb51f57d4b9199d719bc147bb8d832c4a9e6d258214221fe External event backlog went down External event backlog recovered https://status.buttondown.com/ Tue, 29 Jul 2025 13:10:57 +0000 https://status.buttondown.com/#3d8466adaa51464f05ccbf74d1e75c884ac6de6487aeea0d076e78abd941f305 External event backlog recovered External event backlog went down https://status.buttondown.com/ Tue, 29 Jul 2025 13:00:45 +0000 https://status.buttondown.com/#3d8466adaa51464f05ccbf74d1e75c884ac6de6487aeea0d076e78abd941f305 External event backlog went down External event backlog recovered https://status.buttondown.com/ Tue, 29 Jul 2025 12:30:40 +0000 https://status.buttondown.com/#9762593782ff8175a48a75d080d747c4ab390ea40c1d99a86bad71cb4d42f2d2 External event backlog recovered External event backlog went down https://status.buttondown.com/ Tue, 29 Jul 2025 12:20:35 +0000 https://status.buttondown.com/#9762593782ff8175a48a75d080d747c4ab390ea40c1d99a86bad71cb4d42f2d2 External event backlog went down External event backlog recovered https://status.buttondown.com/ Thu, 24 Jul 2025 06:31:06 +0000 https://status.buttondown.com/#ec564355e71c90da53c95cfdf1715eac200aaab626537b9f43086988b0c77909 External event backlog recovered External event backlog went down https://status.buttondown.com/ Thu, 24 Jul 2025 05:51:11 +0000 https://status.buttondown.com/#ec564355e71c90da53c95cfdf1715eac200aaab626537b9f43086988b0c77909 External event backlog went down App is currently unavailable https://status.buttondown.com/incident/623243 Mon, 21 Jul 2025 13:19:00 -0000 https://status.buttondown.com/incident/623243#bd0d70d55a97d3d86f58baf65ff2364ec5ca6e7903f0fa0699ac4551a40911a9 We've addressed the root cause of the issue, and our systems are now fully operational. We experienced a sudden increase in load that overwhelmed our previous process model, making it difficult for the system to recover gracefully once under stress. To resolve this, we’ve shifted our load balancing approach and scaled up, which allows us to run multiple concurrent processes per instance. This not only improves overall capacity and responsiveness but also lets us restart individual processes without placing stress on the entire system - increasing resilience moving forward. App is currently unavailable https://status.buttondown.com/incident/623243 Mon, 21 Jul 2025 13:19:00 -0000 https://status.buttondown.com/incident/623243#bd0d70d55a97d3d86f58baf65ff2364ec5ca6e7903f0fa0699ac4551a40911a9 We've addressed the root cause of the issue, and our systems are now fully operational. We experienced a sudden increase in load that overwhelmed our previous process model, making it difficult for the system to recover gracefully once under stress. To resolve this, we’ve shifted our load balancing approach and scaled up, which allows us to run multiple concurrent processes per instance. This not only improves overall capacity and responsiveness but also lets us restart individual processes without placing stress on the entire system - increasing resilience moving forward. p95 response time recovered https://status.buttondown.com/ Mon, 21 Jul 2025 12:55:41 +0000 https://status.buttondown.com/#59b2d3d6183fc24fb702b39dae84f6ee49f31fc85564a9bb068c7e9c8a11cea5 p95 response time recovered buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 12:51:03 +0000 https://status.buttondown.com/#e44d82d0767bcff6ae9deb8f0f5f358020b2900cd0116ddc183e972b797b4f38 buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 12:39:29 +0000 https://status.buttondown.com/#e44d82d0767bcff6ae9deb8f0f5f358020b2900cd0116ddc183e972b797b4f38 buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 12:26:41 +0000 https://status.buttondown.com/#45aaaab0927799f9e40563085d4b46c8a5d7a6b2b7355c0eb1098e4c7fecaeb8 buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 12:21:35 +0000 https://status.buttondown.com/#45aaaab0927799f9e40563085d4b46c8a5d7a6b2b7355c0eb1098e4c7fecaeb8 buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 12:11:55 +0000 https://status.buttondown.com/#845ba90579cae93ec9b04b10d13414451fe2e6e4256ae29cbd6930d7677d2efb buttondown.com/applied-cartography recovered App is currently unavailable https://status.buttondown.com/incident/623243 Mon, 21 Jul 2025 12:07:00 -0000 https://status.buttondown.com/incident/623243#18fdbf45b099bbc0934cd1d0832c2d8777f3f42ab44235f54a1dae4efcb97226 The service has been restored, but there may still be intermittent errors and delays in loading the application. We're continuing to investigate the root cause and monitor the status of the application. App is currently unavailable https://status.buttondown.com/incident/623243 Mon, 21 Jul 2025 12:07:00 -0000 https://status.buttondown.com/incident/623243#18fdbf45b099bbc0934cd1d0832c2d8777f3f42ab44235f54a1dae4efcb97226 The service has been restored, but there may still be intermittent errors and delays in loading the application. We're continuing to investigate the root cause and monitor the status of the application. buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 12:06:04 +0000 https://status.buttondown.com/#845ba90579cae93ec9b04b10d13414451fe2e6e4256ae29cbd6930d7677d2efb buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 11:50:59 +0000 https://status.buttondown.com/#647a2ad1ca9990e2cefe54674f09632e1b685af2d4e9287393b8ecd4f0dbe34a buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 11:48:54 +0000 https://status.buttondown.com/#647a2ad1ca9990e2cefe54674f09632e1b685af2d4e9287393b8ecd4f0dbe34a buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 11:41:54 +0000 https://status.buttondown.com/#76dd31a523fa3333596acbc6a55006efab68d4af8f4f40fb08fa26c01d9a7779 buttondown.com/applied-cartography recovered App is currently unavailable https://status.buttondown.com/incident/623243 Mon, 21 Jul 2025 11:35:00 -0000 https://status.buttondown.com/incident/623243#3bec73a4b52584a3f72d4bc92aa4c200a861b1a9637398bceaeac1aa44f873ee We're currently investigating reports that the web app is unavailable. This is impacting the Dashboard, login and hosted archives. App is currently unavailable https://status.buttondown.com/incident/623243 Mon, 21 Jul 2025 11:35:00 -0000 https://status.buttondown.com/incident/623243#3bec73a4b52584a3f72d4bc92aa4c200a861b1a9637398bceaeac1aa44f873ee We're currently investigating reports that the web app is unavailable. This is impacting the Dashboard, login and hosted archives. buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 11:30:42 +0000 https://status.buttondown.com/#76dd31a523fa3333596acbc6a55006efab68d4af8f4f40fb08fa26c01d9a7779 buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 10:56:40 +0000 https://status.buttondown.com/#109146cde054345c29b5191472023425ebdcca1be944dcb9f85d8632422f53a0 buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 10:51:53 +0000 https://status.buttondown.com/#109146cde054345c29b5191472023425ebdcca1be944dcb9f85d8632422f53a0 buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 08:35:49 +0000 https://status.buttondown.com/#8c8d8cb9f94777338ba6ce9be8a024c99b856c90087206764c8df418f06efceb buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 08:33:12 +0000 https://status.buttondown.com/#8c8d8cb9f94777338ba6ce9be8a024c99b856c90087206764c8df418f06efceb buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 08:08:12 +0000 https://status.buttondown.com/#2724727ea1df2b125108e1021771d83c29fee37d60c2c8ec25dd4ab33156c03b buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 08:03:00 +0000 https://status.buttondown.com/#2724727ea1df2b125108e1021771d83c29fee37d60c2c8ec25dd4ab33156c03b buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 07:26:22 +0000 https://status.buttondown.com/#582ce9bef8b741e50979481eb23939fd1a8e11dc79c811e62721e4d9e58a548b buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 07:05:23 +0000 https://status.buttondown.com/#582ce9bef8b741e50979481eb23939fd1a8e11dc79c811e62721e4d9e58a548b buttondown.com/applied-cartography went down p95 response time went down https://status.buttondown.com/ Mon, 21 Jul 2025 06:05:27 +0000 https://status.buttondown.com/#59b2d3d6183fc24fb702b39dae84f6ee49f31fc85564a9bb068c7e9c8a11cea5 p95 response time went down p95 response time recovered https://status.buttondown.com/ Mon, 21 Jul 2025 05:21:26 +0000 https://status.buttondown.com/#35547c9c4c98088873fee3b7645d211f9eee7809d51017363ec262b8ef49b89a p95 response time recovered buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 05:02:08 +0000 https://status.buttondown.com/#b60755d8414403a9a06b1f3abe2acfa176638667b1977a0173f65f8373a5701e buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 04:56:52 +0000 https://status.buttondown.com/#b60755d8414403a9a06b1f3abe2acfa176638667b1977a0173f65f8373a5701e buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 04:53:01 +0000 https://status.buttondown.com/#67196f1b4dc60ac7f58f798bfde053e419bd0841de171b4b160fc3a7574c038f buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 04:44:22 +0000 https://status.buttondown.com/#67196f1b4dc60ac7f58f798bfde053e419bd0841de171b4b160fc3a7574c038f buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 21 Jul 2025 04:05:04 +0000 https://status.buttondown.com/#a1e70278b9b37be61bd7ee1937d9138e5a5af228fecbc3e20769191ea8fc1c7b buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 21 Jul 2025 03:59:22 +0000 https://status.buttondown.com/#a1e70278b9b37be61bd7ee1937d9138e5a5af228fecbc3e20769191ea8fc1c7b buttondown.com/applied-cartography went down p95 response time went down https://status.buttondown.com/ Mon, 21 Jul 2025 03:41:20 +0000 https://status.buttondown.com/#35547c9c4c98088873fee3b7645d211f9eee7809d51017363ec262b8ef49b89a p95 response time went down p95 response time recovered https://status.buttondown.com/ Mon, 21 Jul 2025 03:15:35 +0000 https://status.buttondown.com/#96ff8b7352aeb4d6a3f7a79a606e4fbcdc14a55376603fca81c3ee9c35ce4610 p95 response time recovered p95 response time went down https://status.buttondown.com/ Mon, 21 Jul 2025 03:01:34 +0000 https://status.buttondown.com/#96ff8b7352aeb4d6a3f7a79a606e4fbcdc14a55376603fca81c3ee9c35ce4610 p95 response time went down p95 response time recovered https://status.buttondown.com/ Mon, 21 Jul 2025 02:46:02 +0000 https://status.buttondown.com/#b0669135e9bd66b5bfdd8fa918ffb809f772b9c1e02596eeaccf74e1e57317db p95 response time recovered p95 response time went down https://status.buttondown.com/ Mon, 21 Jul 2025 02:25:49 +0000 https://status.buttondown.com/#b0669135e9bd66b5bfdd8fa918ffb809f772b9c1e02596eeaccf74e1e57317db p95 response time went down p95 response time recovered https://status.buttondown.com/ Mon, 21 Jul 2025 02:11:29 +0000 https://status.buttondown.com/#6ba22cf4db62e3e058f305ef8805d929c6117ad9cf041c2e737d11459ec29c6e p95 response time recovered p95 response time went down https://status.buttondown.com/ Mon, 21 Jul 2025 01:51:17 +0000 https://status.buttondown.com/#6ba22cf4db62e3e058f305ef8805d929c6117ad9cf041c2e737d11459ec29c6e p95 response time went down p95 response time recovered https://status.buttondown.com/ Mon, 21 Jul 2025 00:41:26 +0000 https://status.buttondown.com/#1c7c9356dc50d3f729369c7347630d8badbfc57063cdce9d5ed5e4bcb29bae96 p95 response time recovered p95 response time went down https://status.buttondown.com/ Sun, 20 Jul 2025 23:05:51 +0000 https://status.buttondown.com/#1c7c9356dc50d3f729369c7347630d8badbfc57063cdce9d5ed5e4bcb29bae96 p95 response time went down p95 response time recovered https://status.buttondown.com/ Sun, 20 Jul 2025 22:31:18 +0000 https://status.buttondown.com/#2890fa70e0046093a0d19ca6d84adb28dfd082727be5e053ff09a529304ce3ba p95 response time recovered p95 response time went down https://status.buttondown.com/ Sun, 20 Jul 2025 22:01:49 +0000 https://status.buttondown.com/#2890fa70e0046093a0d19ca6d84adb28dfd082727be5e053ff09a529304ce3ba p95 response time went down p95 response time recovered https://status.buttondown.com/ Sun, 20 Jul 2025 21:15:49 +0000 https://status.buttondown.com/#3032799f1ed513c2004f10f11c9ec457910688b64eb426fda39a1e7195dc48ae p95 response time recovered p95 response time went down https://status.buttondown.com/ Sun, 20 Jul 2025 20:51:31 +0000 https://status.buttondown.com/#3032799f1ed513c2004f10f11c9ec457910688b64eb426fda39a1e7195dc48ae p95 response time went down p95 response time recovered https://status.buttondown.com/ Sun, 20 Jul 2025 19:05:48 +0000 https://status.buttondown.com/#76c311b47d83661e679b694d61089009ab1df823b6979b5ddd9b6ddd45d295d1 p95 response time recovered p95 response time went down https://status.buttondown.com/ Sun, 20 Jul 2025 18:05:51 +0000 https://status.buttondown.com/#76c311b47d83661e679b694d61089009ab1df823b6979b5ddd9b6ddd45d295d1 p95 response time went down External event backlog recovered https://status.buttondown.com/ Sat, 19 Jul 2025 14:51:24 +0000 https://status.buttondown.com/#7066cc3509b91510b22e5f3ddca3e6ed7a5ffb4dbed53ec43c01e914e6e1e13b External event backlog recovered External event backlog went down https://status.buttondown.com/ Sat, 19 Jul 2025 14:41:26 +0000 https://status.buttondown.com/#7066cc3509b91510b22e5f3ddca3e6ed7a5ffb4dbed53ec43c01e914e6e1e13b External event backlog went down p95 response time recovered https://status.buttondown.com/ Fri, 18 Jul 2025 01:50:45 +0000 https://status.buttondown.com/#eff8fd40be3edfb2ab4ca7afd7965d08d5e1629142475cb6d5cd5882e7fa9928 p95 response time recovered p95 response time went down https://status.buttondown.com/ Fri, 18 Jul 2025 01:35:50 +0000 https://status.buttondown.com/#eff8fd40be3edfb2ab4ca7afd7965d08d5e1629142475cb6d5cd5882e7fa9928 p95 response time went down p95 response time recovered https://status.buttondown.com/ Fri, 18 Jul 2025 01:25:45 +0000 https://status.buttondown.com/#904c6d6660b09c2e29358190bec19ed3b2a5afa7af51328f6152a5631504095b p95 response time recovered p95 response time went down https://status.buttondown.com/ Fri, 18 Jul 2025 01:10:54 +0000 https://status.buttondown.com/#904c6d6660b09c2e29358190bec19ed3b2a5afa7af51328f6152a5631504095b p95 response time went down p95 response time recovered https://status.buttondown.com/ Thu, 17 Jul 2025 18:05:43 +0000 https://status.buttondown.com/#87af05576db68b79499a8fe8cda26f59480901ba49bed9c7db1e3daa07f1ffab p95 response time recovered App is currently unavailable https://status.buttondown.com/incident/621474 Thu, 17 Jul 2025 17:42:00 -0000 https://status.buttondown.com/incident/621474#8897b6d9dd258d8df6abe97c13d86ef1020e1a6463a0aed3a388cd4f6c4ebcce ## What was happening For the past few days, we’ve been experiencing `thundering herd`-esque downtime every time we deploy at the top of the hour. The top of the hour nuance is not actually that uncommon: we serve a lot of RSS traffic. (A fun fact is that 80% of our page views come from RSS readers and scrapers and the vast majority of them are fairly naive and run the equivalent of an hourly cron to ping every single RSS feed in which they are interested.) After investigation, we were able to identify the reason why this started happening recently, even though the above traffic pattern and our general CI/CD posture has remained unchanged. Part of our firewall system involves checking incoming IP addresses against a deny list culled from a variety of trusted sources. In order to make the firewall as performant as possible, we aggressively cache that list so that we're not pulling it every time a subscription attempt is made. However, we recently changed the logic to expand the purview of the database that held those IP addresses to also store aggregate-level data about IPs for telemetry purposes. At a high level, the logic looked something like this: ``` @cache def get_problematic_ip_addresses(): ip_address_models = IPAddress.objects.all() return { ip.ip_address for ip in ip_address_models if ip.do_not_honor } ``` And that logic remained the same! _But_ that backing `IPAddress` model went from a few hundred records to a few hundred _thousand_, replete with a JSON payload for each IP. And because we were caching this, it meant that even with rolling deploys, every single time a new server would come online, it would be aggressively unresponsive as it tried to pull and then collate every single IP address within the 30-second time span of a request. We’ve fixed this trivially: ``` @cache def get_problematic_ip_addresses(): ip_address_models = IPAddress.objects.filter(do_not_honor=True) return { ip.ip_address for ip in ip_address_models } ``` Going forward, we’ll be paying much closer attention to the actual timeline of the deploy process. It was easy to chalk this up to luck of the draw to a certain extent, but such “luck” is scarce and still came at the cost of severely degraded performance. By logging and alerting on startup time and deviations thereof, we’ll be able to more actively identify aberrations of this nature in the future. App is currently unavailable https://status.buttondown.com/incident/621474 Thu, 17 Jul 2025 17:42:00 -0000 https://status.buttondown.com/incident/621474#8897b6d9dd258d8df6abe97c13d86ef1020e1a6463a0aed3a388cd4f6c4ebcce ## What was happening For the past few days, we’ve been experiencing `thundering herd`-esque downtime every time we deploy at the top of the hour. The top of the hour nuance is not actually that uncommon: we serve a lot of RSS traffic. (A fun fact is that 80% of our page views come from RSS readers and scrapers and the vast majority of them are fairly naive and run the equivalent of an hourly cron to ping every single RSS feed in which they are interested.) After investigation, we were able to identify the reason why this started happening recently, even though the above traffic pattern and our general CI/CD posture has remained unchanged. Part of our firewall system involves checking incoming IP addresses against a deny list culled from a variety of trusted sources. In order to make the firewall as performant as possible, we aggressively cache that list so that we're not pulling it every time a subscription attempt is made. However, we recently changed the logic to expand the purview of the database that held those IP addresses to also store aggregate-level data about IPs for telemetry purposes. At a high level, the logic looked something like this: ``` @cache def get_problematic_ip_addresses(): ip_address_models = IPAddress.objects.all() return { ip.ip_address for ip in ip_address_models if ip.do_not_honor } ``` And that logic remained the same! _But_ that backing `IPAddress` model went from a few hundred records to a few hundred _thousand_, replete with a JSON payload for each IP. And because we were caching this, it meant that even with rolling deploys, every single time a new server would come online, it would be aggressively unresponsive as it tried to pull and then collate every single IP address within the 30-second time span of a request. We’ve fixed this trivially: ``` @cache def get_problematic_ip_addresses(): ip_address_models = IPAddress.objects.filter(do_not_honor=True) return { ip.ip_address for ip in ip_address_models } ``` Going forward, we’ll be paying much closer attention to the actual timeline of the deploy process. It was easy to chalk this up to luck of the draw to a certain extent, but such “luck” is scarce and still came at the cost of severely degraded performance. By logging and alerting on startup time and deviations thereof, we’ll be able to more actively identify aberrations of this nature in the future. App is currently unavailable https://status.buttondown.com/incident/621474 Thu, 17 Jul 2025 16:10:00 -0000 https://status.buttondown.com/incident/621474#5fbeb353a7c7d198be4af136c1e87fb07b7367b9191e1269e19d3dde33e9a655 We've identified the issue, and see service being restored. We're continuing to monitor as there may be intermittent errors. App is currently unavailable https://status.buttondown.com/incident/621474 Thu, 17 Jul 2025 16:10:00 -0000 https://status.buttondown.com/incident/621474#5fbeb353a7c7d198be4af136c1e87fb07b7367b9191e1269e19d3dde33e9a655 We've identified the issue, and see service being restored. We're continuing to monitor as there may be intermittent errors. buttondown.com/applied-cartography recovered https://status.buttondown.com/ Thu, 17 Jul 2025 16:04:40 +0000 https://status.buttondown.com/#760cf143bafe65d84d1c3217ed547331bd4bd08f008e481fabc93d3e5f953edf buttondown.com/applied-cartography recovered p95 response time went down https://status.buttondown.com/ Thu, 17 Jul 2025 15:50:45 +0000 https://status.buttondown.com/#87af05576db68b79499a8fe8cda26f59480901ba49bed9c7db1e3daa07f1ffab p95 response time went down buttondown.com/applied-cartography went down https://status.buttondown.com/ Thu, 17 Jul 2025 15:50:12 +0000 https://status.buttondown.com/#760cf143bafe65d84d1c3217ed547331bd4bd08f008e481fabc93d3e5f953edf buttondown.com/applied-cartography went down App is currently unavailable https://status.buttondown.com/incident/621474 Thu, 17 Jul 2025 15:50:00 -0000 https://status.buttondown.com/incident/621474#33a46e67d146d1bf1652eed822c57b0249fbe0a4d8ca0b593bdc5e0661710edf We're currently investigating reports that the web app is unavailable. This is impacting the Dashboard, login and hosted archives. App is currently unavailable https://status.buttondown.com/incident/621474 Thu, 17 Jul 2025 15:50:00 -0000 https://status.buttondown.com/incident/621474#33a46e67d146d1bf1652eed822c57b0249fbe0a4d8ca0b593bdc5e0661710edf We're currently investigating reports that the web app is unavailable. This is impacting the Dashboard, login and hosted archives. buttondown.com/applied-cartography recovered https://status.buttondown.com/ Thu, 17 Jul 2025 15:13:17 +0000 https://status.buttondown.com/#231362a422ff8d146cf56f8debef37c2582e6c38c7b6048ffc627927bf629478 buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Thu, 17 Jul 2025 15:11:55 +0000 https://status.buttondown.com/#231362a422ff8d146cf56f8debef37c2582e6c38c7b6048ffc627927bf629478 buttondown.com/applied-cartography went down p95 response time recovered https://status.buttondown.com/ Thu, 17 Jul 2025 02:16:06 +0000 https://status.buttondown.com/#eb359bad2876cc5316bfa31c14a0295f14c49be769a3828c135402124e09f1e1 p95 response time recovered p95 response time went down https://status.buttondown.com/ Thu, 17 Jul 2025 02:05:44 +0000 https://status.buttondown.com/#eb359bad2876cc5316bfa31c14a0295f14c49be769a3828c135402124e09f1e1 p95 response time went down Degraded performance on hosted archives https://status.buttondown.com/incident/620933 Wed, 16 Jul 2025 19:23:00 -0000 https://status.buttondown.com/incident/620933#8035bf51d92b8674b8c32decb664797b00c5c5e53b6e53d40a6a2cc181eeafcc The issue is resolved, and archives are performing as expected. p95 response time recovered https://status.buttondown.com/ Wed, 16 Jul 2025 19:11:36 +0000 https://status.buttondown.com/#b285439c8de0eee9fcca942b6800ed8776ce2d9ae20017d97e980c1f5f495f4b p95 response time recovered buttondown.com/applied-cartography recovered https://status.buttondown.com/ Wed, 16 Jul 2025 19:04:54 +0000 https://status.buttondown.com/#c02b0bf88fab32698acec6e03e0edb34527c9ba48d9b8c9f8d4b3f621d6d4331 buttondown.com/applied-cartography recovered Degraded performance on hosted archives https://status.buttondown.com/incident/620933 Wed, 16 Jul 2025 18:40:00 -0000 https://status.buttondown.com/incident/620933#e3db4a06e553da476faf09fdcc1dc90e64b20a3ff1cc5247817823cf99a2f166 We're monitoring reports of degraded archive performance caused by an incident [with our hosting vendor](https://status.heroku.com/incidents/2860). This is causing archives to error intermittently. buttondown.com/applied-cartography went down https://status.buttondown.com/ Wed, 16 Jul 2025 18:38:52 +0000 https://status.buttondown.com/#c02b0bf88fab32698acec6e03e0edb34527c9ba48d9b8c9f8d4b3f621d6d4331 buttondown.com/applied-cartography went down p95 response time went down https://status.buttondown.com/ Wed, 16 Jul 2025 18:35:50 +0000 https://status.buttondown.com/#b285439c8de0eee9fcca942b6800ed8776ce2d9ae20017d97e980c1f5f495f4b p95 response time went down p95 response time recovered https://status.buttondown.com/ Wed, 16 Jul 2025 14:56:12 +0000 https://status.buttondown.com/#9443de54d7793a70824f71a643b9bc407ded28b0990632c2dcd9882920efe93e p95 response time recovered p95 response time went down https://status.buttondown.com/ Wed, 16 Jul 2025 14:46:01 +0000 https://status.buttondown.com/#9443de54d7793a70824f71a643b9bc407ded28b0990632c2dcd9882920efe93e p95 response time went down p95 response time recovered https://status.buttondown.com/ Tue, 15 Jul 2025 20:20:24 +0000 https://status.buttondown.com/#bce6501d09f032bd3a73b60a0c8a96ae48b62e0c767e042d2ac4b122a4dfefdb p95 response time recovered p95 response time went down https://status.buttondown.com/ Tue, 15 Jul 2025 20:10:23 +0000 https://status.buttondown.com/#bce6501d09f032bd3a73b60a0c8a96ae48b62e0c767e042d2ac4b122a4dfefdb p95 response time went down p95 response time recovered https://status.buttondown.com/ Tue, 15 Jul 2025 18:40:49 +0000 https://status.buttondown.com/#111d8c231c438f16c4f316dac2118a664cfeed3f786c2278ee00de1fa35efa9e p95 response time recovered p95 response time went down https://status.buttondown.com/ Tue, 15 Jul 2025 18:15:48 +0000 https://status.buttondown.com/#111d8c231c438f16c4f316dac2118a664cfeed3f786c2278ee00de1fa35efa9e p95 response time went down p95 response time recovered https://status.buttondown.com/ Sat, 12 Jul 2025 00:40:49 +0000 https://status.buttondown.com/#02f2b3f235e360660900b2f4fffe3db6d9c46b323263e2d83d93b79638b5e850 p95 response time recovered p95 response time went down https://status.buttondown.com/ Sat, 12 Jul 2025 00:30:46 +0000 https://status.buttondown.com/#02f2b3f235e360660900b2f4fffe3db6d9c46b323263e2d83d93b79638b5e850 p95 response time went down p95 response time recovered https://status.buttondown.com/ Tue, 08 Jul 2025 19:40:46 +0000 https://status.buttondown.com/#4c84ca6c4db8c27fb95931512d9ec2d53426738f16f936ae1bd403091a462b2b p95 response time recovered p95 response time went down https://status.buttondown.com/ Tue, 08 Jul 2025 19:25:37 +0000 https://status.buttondown.com/#4c84ca6c4db8c27fb95931512d9ec2d53426738f16f936ae1bd403091a462b2b p95 response time went down External event backlog recovered https://status.buttondown.com/ Tue, 08 Jul 2025 02:30:27 +0000 https://status.buttondown.com/#be48f0964fb066dcef2bd1854d5ddee82db34ceb2eb9ffd1b4fe6dc17a7c691b External event backlog recovered External event backlog went down https://status.buttondown.com/ Tue, 08 Jul 2025 02:00:34 +0000 https://status.buttondown.com/#be48f0964fb066dcef2bd1854d5ddee82db34ceb2eb9ffd1b4fe6dc17a7c691b External event backlog went down p95 response time recovered https://status.buttondown.com/ Mon, 07 Jul 2025 15:30:38 +0000 https://status.buttondown.com/#03637a664875231c157f1c752a8e11326ea68f3442db12c28514fbeb565affb1 p95 response time recovered p95 response time went down https://status.buttondown.com/ Mon, 07 Jul 2025 15:00:35 +0000 https://status.buttondown.com/#03637a664875231c157f1c752a8e11326ea68f3442db12c28514fbeb565affb1 p95 response time went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 07 Jul 2025 04:20:37 +0000 https://status.buttondown.com/#162331393564c93951ce5776a165d1fe45491fdccb6ceec185747a6cd6154283 buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 07 Jul 2025 04:15:28 +0000 https://status.buttondown.com/#162331393564c93951ce5776a165d1fe45491fdccb6ceec185747a6cd6154283 buttondown.com/applied-cartography went down buttondown.com/applied-cartography recovered https://status.buttondown.com/ Sun, 06 Jul 2025 14:29:46 +0000 https://status.buttondown.com/#f7f60983786f061b567226c3b78f4f9ad6f94c772371bb91727e9478da83449f buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Sun, 06 Jul 2025 13:48:29 +0000 https://status.buttondown.com/#f7f60983786f061b567226c3b78f4f9ad6f94c772371bb91727e9478da83449f buttondown.com/applied-cartography went down p95 response time recovered https://status.buttondown.com/ Sun, 06 Jul 2025 06:15:59 +0000 https://status.buttondown.com/#b88495bcc1a1c52e7cea286f5112d53eabdc527aa13d15ba683ea90dd176dbc0 p95 response time recovered p95 response time went down https://status.buttondown.com/ Sun, 06 Jul 2025 06:05:43 +0000 https://status.buttondown.com/#b88495bcc1a1c52e7cea286f5112d53eabdc527aa13d15ba683ea90dd176dbc0 p95 response time went down External event backlog recovered https://status.buttondown.com/ Tue, 01 Jul 2025 01:00:46 +0000 https://status.buttondown.com/#f286181b44b4bad87650d70ba56efccc8dea2371fa7776b4bcc3ad4078c06f34 External event backlog recovered External event backlog went down https://status.buttondown.com/ Tue, 01 Jul 2025 00:40:36 +0000 https://status.buttondown.com/#f286181b44b4bad87650d70ba56efccc8dea2371fa7776b4bcc3ad4078c06f34 External event backlog went down buttondown.com recovered https://status.buttondown.com/ Mon, 30 Jun 2025 12:12:51 +0000 https://status.buttondown.com/#3d21bafd03afa3f518fb30afdbf2f34b88498ecb83a3bf471698266173a03e84 buttondown.com recovered buttondown.com went down https://status.buttondown.com/ Mon, 30 Jun 2025 12:09:50 +0000 https://status.buttondown.com/#3d21bafd03afa3f518fb30afdbf2f34b88498ecb83a3bf471698266173a03e84 buttondown.com went down p95 response time recovered https://status.buttondown.com/ Sun, 29 Jun 2025 14:20:58 +0000 https://status.buttondown.com/#e3853f70b50b4becddc2ad66317a16867053b960dcb59b76ff8f05c03b816b64 p95 response time recovered p95 response time went down https://status.buttondown.com/ Sun, 29 Jun 2025 13:35:54 +0000 https://status.buttondown.com/#e3853f70b50b4becddc2ad66317a16867053b960dcb59b76ff8f05c03b816b64 p95 response time went down p95 response time recovered https://status.buttondown.com/ Sun, 29 Jun 2025 13:21:00 +0000 https://status.buttondown.com/#889677d8782cdb8e332574c1897dec41859ee2851852e68ef534f5517124b2cb p95 response time recovered buttondown.com/applied-cartography recovered https://status.buttondown.com/ Sun, 29 Jun 2025 13:13:03 +0000 https://status.buttondown.com/#80035ce13e65a35570e74145e87add143ac6d1d24616037706a5ac0de0c0be4d buttondown.com/applied-cartography recovered p95 response time went down https://status.buttondown.com/ Sun, 29 Jun 2025 12:05:49 +0000 https://status.buttondown.com/#889677d8782cdb8e332574c1897dec41859ee2851852e68ef534f5517124b2cb p95 response time went down buttondown.com/applied-cartography went down https://status.buttondown.com/ Sun, 29 Jun 2025 12:05:01 +0000 https://status.buttondown.com/#80035ce13e65a35570e74145e87add143ac6d1d24616037706a5ac0de0c0be4d buttondown.com/applied-cartography went down buttondown.com recovered https://status.buttondown.com/ Fri, 27 Jun 2025 19:58:55 +0000 https://status.buttondown.com/#35a1af5743a29aa76a0be182e5ff979daaaf8023115a786752cb9bbeaa1481cf buttondown.com recovered buttondown.com went down https://status.buttondown.com/ Fri, 27 Jun 2025 19:43:54 +0000 https://status.buttondown.com/#35a1af5743a29aa76a0be182e5ff979daaaf8023115a786752cb9bbeaa1481cf buttondown.com went down buttondown.com recovered https://status.buttondown.com/ Fri, 27 Jun 2025 19:37:59 +0000 https://status.buttondown.com/#7ac12c8ac925e7b4a2f3e37b0bf6406a0313a2b34abffb44327613e7dcfeee9b buttondown.com recovered buttondown.com went down https://status.buttondown.com/ Fri, 27 Jun 2025 19:31:53 +0000 https://status.buttondown.com/#7ac12c8ac925e7b4a2f3e37b0bf6406a0313a2b34abffb44327613e7dcfeee9b buttondown.com went down p95 response time recovered https://status.buttondown.com/ Thu, 26 Jun 2025 18:00:43 +0000 https://status.buttondown.com/#2581f7cbb842c149c9c0d0df5fd2989a801b9ba2d0ba953a4e48d3ac7cd171c9 p95 response time recovered p95 response time went down https://status.buttondown.com/ Thu, 26 Jun 2025 17:45:34 +0000 https://status.buttondown.com/#2581f7cbb842c149c9c0d0df5fd2989a801b9ba2d0ba953a4e48d3ac7cd171c9 p95 response time went down Incoming traffic being blocked https://status.buttondown.com/incident/607839 Mon, 23 Jun 2025 18:45:00 -0000 https://status.buttondown.com/incident/607839#29cc2cacaf77e405a152885eb4093083a30a07a1b9516cdf5a9f605e1f9b973e We've identified that our hosting provider was incorrectly identifying incoming traffic as being from bots. This issue is now resolved, and all services have been restored. Incoming traffic being blocked https://status.buttondown.com/incident/607839 Mon, 23 Jun 2025 18:45:00 -0000 https://status.buttondown.com/incident/607839#29cc2cacaf77e405a152885eb4093083a30a07a1b9516cdf5a9f605e1f9b973e We've identified that our hosting provider was incorrectly identifying incoming traffic as being from bots. This issue is now resolved, and all services have been restored. buttondown.com recovered https://status.buttondown.com/ Mon, 23 Jun 2025 18:41:49 +0000 https://status.buttondown.com/#9e5a9160149dbba8a4eccc8c4f43f202ad1da5fb04e2a4d38b511fdefd5fa3d0 buttondown.com recovered buttondown.com went down https://status.buttondown.com/ Mon, 23 Jun 2025 18:26:50 +0000 https://status.buttondown.com/#9e5a9160149dbba8a4eccc8c4f43f202ad1da5fb04e2a4d38b511fdefd5fa3d0 buttondown.com went down Incoming traffic being blocked https://status.buttondown.com/incident/607839 Mon, 23 Jun 2025 18:25:00 -0000 https://status.buttondown.com/incident/607839#79ed7de228dac5f8a4c0f9b789320d5658dc4971b183b7479809e9d26b4c6b0a We're investigating reports of Vercel error messages on our marketing site, documentation site, and while logging into the Buttondown app. The API, and the outbound email backlog are *not* impacted by this incident. Incoming traffic being blocked https://status.buttondown.com/incident/607839 Mon, 23 Jun 2025 18:25:00 -0000 https://status.buttondown.com/incident/607839#79ed7de228dac5f8a4c0f9b789320d5658dc4971b183b7479809e9d26b4c6b0a We're investigating reports of Vercel error messages on our marketing site, documentation site, and while logging into the Buttondown app. The API, and the outbound email backlog are *not* impacted by this incident. buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 23 Jun 2025 12:21:15 +0000 https://status.buttondown.com/#6b56113aef298a9af4f14e3b01bd3ef308a4822006750099f89a81fd2d8e7c00 buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 23 Jun 2025 12:13:11 +0000 https://status.buttondown.com/#6b56113aef298a9af4f14e3b01bd3ef308a4822006750099f89a81fd2d8e7c00 buttondown.com/applied-cartography went down External event backlog recovered https://status.buttondown.com/ Sat, 21 Jun 2025 14:50:44 +0000 https://status.buttondown.com/#22fb28fc4bf6a86ee52fea3a94bb07614d99d27ca028400c977794f84558eb81 External event backlog recovered External event backlog went down https://status.buttondown.com/ Sat, 21 Jun 2025 14:30:44 +0000 https://status.buttondown.com/#22fb28fc4bf6a86ee52fea3a94bb07614d99d27ca028400c977794f84558eb81 External event backlog went down External event backlog recovered https://status.buttondown.com/ Tue, 17 Jun 2025 15:40:37 +0000 https://status.buttondown.com/#30fd71cc33cc4b774f85a67676a9ac0d27ff3574b31e7e620ae0fe690f3cd08f External event backlog recovered External event backlog went down https://status.buttondown.com/ Tue, 17 Jun 2025 15:20:40 +0000 https://status.buttondown.com/#30fd71cc33cc4b774f85a67676a9ac0d27ff3574b31e7e620ae0fe690f3cd08f External event backlog went down p95 response time recovered https://status.buttondown.com/ Mon, 16 Jun 2025 13:30:31 +0000 https://status.buttondown.com/#7d97f2cd0f87c6cbf71b4967023282c2903154d461bc77b26fc3f93444c7e76d p95 response time recovered p95 response time went down https://status.buttondown.com/ Mon, 16 Jun 2025 13:15:32 +0000 https://status.buttondown.com/#7d97f2cd0f87c6cbf71b4967023282c2903154d461bc77b26fc3f93444c7e76d p95 response time went down External event backlog recovered https://status.buttondown.com/ Fri, 13 Jun 2025 13:30:45 +0000 https://status.buttondown.com/#7184bb9aab4cafb1e1f623fd51819c4180a25efbf7fa023a32d5b2020c5f1f3b External event backlog recovered External event backlog went down https://status.buttondown.com/ Fri, 13 Jun 2025 13:20:45 +0000 https://status.buttondown.com/#7184bb9aab4cafb1e1f623fd51819c4180a25efbf7fa023a32d5b2020c5f1f3b External event backlog went down Delays in email sending; custom domain registration degraded https://status.buttondown.com/incident/600179 Wed, 11 Jun 2025 05:30:00 -0000 https://status.buttondown.com/incident/600179#9d8199bd755964dddfef56f87f2b7d5806168efa693502fe3259c47409a53fae The incident is now resolved, and [Heroku](https://status.heroku.com/incidents/2822) is back online. (HugOps to their team!) All services are operational, and Buttondown emails are being sent without delay. External event backlog recovered https://status.buttondown.com/ Wed, 11 Jun 2025 01:20:51 +0000 https://status.buttondown.com/#a04af242de4b32aa31338e62c64d99d4bb282a22934996da4aa8843b88fcaf16 External event backlog recovered External event backlog went down https://status.buttondown.com/ Wed, 11 Jun 2025 01:00:56 +0000 https://status.buttondown.com/#a04af242de4b32aa31338e62c64d99d4bb282a22934996da4aa8843b88fcaf16 External event backlog went down Delays in email sending; custom domain registration degraded https://status.buttondown.com/incident/600179 Tue, 10 Jun 2025 22:00:00 -0000 https://status.buttondown.com/incident/600179#86c93dd0ea9349806db5bd0c400fcff492d45a86012eba2ecf76d94bed421034 We're seeing improvements for this ongoing incident: - Previously delayed emails have now been sent - Our backlog is clearing in a timely manner - New custom hosting domains are connecting as expected But as Heroku is [still reporting this as an active outage](https://status.heroku.com/incidents/2822), we're continuing to monitor. External event backlog recovered https://status.buttondown.com/ Tue, 10 Jun 2025 19:30:56 +0000 https://status.buttondown.com/#2ab8523d1faec53a02284c008b4565ff30aad41967d1e0f6d69966e67b52de33 External event backlog recovered External event backlog went down https://status.buttondown.com/ Tue, 10 Jun 2025 18:50:56 +0000 https://status.buttondown.com/#2ab8523d1faec53a02284c008b4565ff30aad41967d1e0f6d69966e67b52de33 External event backlog went down Delays in email sending; custom domain registration degraded https://status.buttondown.com/incident/600179 Tue, 10 Jun 2025 14:32:00 -0000 https://status.buttondown.com/incident/600179#de7febb84dbfb21ab7e5a25bb9402b6bf28ffb71ad0e8f8bf854a37b17c09a90 Our upstream hosting provider, Heroku, is experiencing an [ongoing outage](https://status.heroku.com/incidents/2822). As a result: - We are unable to scale up our servers to match fluctuations in the number of emails our system is sending. **This means that emails may sit in queue and take longer to send out — but they will eventually send without any intervention necessary.** - We are unable to register new custom domains for archives. All other areas of Buttondown are **unaffected**. - Emails **will eventually send without any intervention**, but may take longer to do so. **If you're waiting on an email to go out, please do not attempt to send it a second time.** - Custom domains that are already set up will work as usual. - Custom domain setup for archives will work again [after Heroku's incident is closed](https://status.heroku.com/incidents/2822). buttondown.com/applied-cartography recovered https://status.buttondown.com/ Mon, 09 Jun 2025 16:02:56 +0000 https://status.buttondown.com/#766de8993ac4f150b930be6c5b9d844d3b1512feb26d328411666aa1112dcb02 buttondown.com/applied-cartography recovered buttondown.com/applied-cartography went down https://status.buttondown.com/ Mon, 09 Jun 2025 15:56:56 +0000 https://status.buttondown.com/#766de8993ac4f150b930be6c5b9d844d3b1512feb26d328411666aa1112dcb02 buttondown.com/applied-cartography went down External event backlog recovered https://status.buttondown.com/ Sat, 07 Jun 2025 19:10:44 +0000 https://status.buttondown.com/#1af65af92fb7b016ceed498f2189f3ecf627f34cf1178f638a873e856000010c External event backlog recovered External event backlog went down https://status.buttondown.com/ Sat, 07 Jun 2025 19:00:35 +0000 https://status.buttondown.com/#1af65af92fb7b016ceed498f2189f3ecf627f34cf1178f638a873e856000010c External event backlog went down External event backlog recovered https://status.buttondown.com/ Wed, 04 Jun 2025 04:10:50 +0000 https://status.buttondown.com/#c4661b1734fb8f5e5413116a227f006e550d054d9735d6684419b3fcfbbf8b8e External event backlog recovered Backlog recovered https://status.buttondown.com/ Wed, 04 Jun 2025 04:00:55 +0000 https://status.buttondown.com/#1b1a5b9bc0cdae5a0e6339e55184db55a7bd072926a21ba0ef509eacfb442f0b Backlog recovered Backlog went down https://status.buttondown.com/ Wed, 04 Jun 2025 03:40:55 +0000 https://status.buttondown.com/#1b1a5b9bc0cdae5a0e6339e55184db55a7bd072926a21ba0ef509eacfb442f0b Backlog went down External event backlog went down https://status.buttondown.com/ Wed, 04 Jun 2025 03:40:52 +0000 https://status.buttondown.com/#c4661b1734fb8f5e5413116a227f006e550d054d9735d6684419b3fcfbbf8b8e External event backlog went down Delays and issues with sending https://status.buttondown.com/incident/547115 Wed, 16 Apr 2025 21:00:00 -0000 https://status.buttondown.com/incident/547115#e3fb3a06a6859995dcb01aa8c5cd27177493f842cf9175822db6b73d8c5e3acf # TL;DR Bad configuration on one of our self-hosted SMTP servers caused a crash that proved difficult to recover from, leaving lots of emails “stuck” in varying degrees – and their being stuck manifested in a slew of unpleasant ways. We’ve fixed the configuration, are investing (literally, right at this very moment) in better tooling and alerting, and are architecting a way to prevent this from ever happening again. ## The gory details Buttondown uses a number of providers to actually send the emails written by authors to their subscribers. In addition to using explicit vendors, we run and maintain our own fleet of servers dedicated to this purpose. We're going to refer to these servers as postal servers, as that's a reference to the great open source project which we rely on. On Wednesday morning, we received some automated alerts from our checker system indicating that our backlog of emails was higher than it should be. After digging in a little bit, we realized the reason it was so high was because each individual email was taking a huge amount of time to send out for one specific server. After a couple more minutes, this server got to the point where all it was doing was trying for a minute and then timing out. (Software engineers reading this might already be getting some ideas of what had happened.) We logged into that server and quickly discovered that the issue was with the database storing messages that were pending delivery. While our initial instinct was thinking that the problem was the overall volume of messages being sent to this particular server, we discovered that the volume was actually secondary to just the overall connection count. We were trying to connect to this database from too many worker threads and it was not set up to recover gracefully or even notify downstream connections what was happening. Once we discovered this, the first order solution was pretty simple. We cycled the database, scaled down the number of workers, and got the connections into a pretty manageable state. The problem we were now left with was that of recovery. We had around 70,000 messages stuck in purgatory. They were technically pending, but some of them, just due to the database connection, were actually correctly sent. Some of them were marked as sent but not actually sent, and so on. We basically entered a fog of war situation where our sources of truth were no longer valid. Our SOP in these cases is to err on the side of caution. Caution in this case means hazmatting that specific server, spinning down all of the workers, leaving all of those messages as pending, and then traffic over to another server or vendor to make sure we don't exacerbate the problem nor accidentally act upon incorrect information. This is exactly what we did. We shifted over traffic, the queue drained, and we resent any emails that we were very, very confident hadn't been sent. We cleared out the problematic server and resumed traffic. ## How we’re fixing it If you've read along this far, you're probably wondering what we're going to do to make this better. The first step, one that is essentially complete by the time we publish this postmortem, is a classic one. Add much more monitoring and alerting. We were over-reliant on the integration and high-level metrics for these servers, which works well when problems are obvious and well-formed, but doesn't work well when they're a little bit more out of the mainstream. To be specific, we already had alerting on pending or stuck messages at a per-server basis. But in order to actually fire those alerts, you needed to have an active connection to the database, which we couldn't have in this scenario. The second one is a little bit broader, which is that we need to do a much, much better job of proactively pushing information about these kinds of spending patterns to you, the author — one of the worst feelings is sending an email and being confused because it's marked as sent, but you haven't seen it in your inbox. We're going to start erring on the side of oversharing about the state of these things so you can proactively poke around within the dashboard and understand what might be causing delays in us getting your emails to your readers. ## Customer impact Over the course of the afternoon, approximately 13,000 subscribers across 40 authors experienced some combination of the following: - Multiple hour delays before receiving a message - Not receiving an email at all (though we’ve redriven these.) - Multiple sends of the same email ## Zooming out To be blunt, we've had too many incidents lately. We've invested a lot in fixing bugs and stability at an object level over the past six months. But we've done a poor job of investing in stability at an end-to-end infrastructural level. The past few weeks have driven that point home. Our most important job as a tool is to reliably send your writing to your subscribers. We have not sufficiently invested in the very boring but very important kinds of observability that we needed to, and we're shifting a lot of our roadmap over the next six months to make sure that our ability to diagnose and resolve these issues is much, much stronger than it has been. If you're still at the end having read through all of this, I know it's not because of rabid curiosity but likely because of frustration because you've trusted us with a job and we haven't been up to the task. But we take this stuff seriously and we're pouring everything we have into it. Dashboard and archives were timing out https://status.buttondown.com/incident/543429 Fri, 11 Apr 2025 01:33:00 -0000 https://status.buttondown.com/incident/543429#528fb50f837c45b8067e1a81db3daf8deb8fc22fbaddf84f6f1be26d2e74da12 From 8:38pm EDT to 8:50pm EDT, we were serving 503s for around 75% of our incoming requests. This was purely due to a high burst of traffic that our scaling mitigated (albeit not quickly enough!) We're going to look into the problematic routes and harden their performance. Dashboard and archives were timing out https://status.buttondown.com/incident/543429 Fri, 11 Apr 2025 01:33:00 -0000 https://status.buttondown.com/incident/543429#528fb50f837c45b8067e1a81db3daf8deb8fc22fbaddf84f6f1be26d2e74da12 From 8:38pm EDT to 8:50pm EDT, we were serving 503s for around 75% of our incoming requests. This was purely due to a high burst of traffic that our scaling mitigated (albeit not quickly enough!) We're going to look into the problematic routes and harden their performance. Subscribers are mistakenly being marked as undeliverable. https://status.buttondown.com/incident/538844 Wed, 02 Apr 2025 18:20:00 -0000 https://status.buttondown.com/incident/538844#c89c920252787b2fc61b8db6be13e52394b29dd7e23de5e32631afaa50858a7a The change has been reverted, and subscribers have been restored to their correct status. There is no action required from authors. There was no impact to subscriber-facing features. Subscribers are mistakenly being marked as undeliverable. https://status.buttondown.com/incident/538844 Wed, 02 Apr 2025 18:20:00 -0000 https://status.buttondown.com/incident/538844#c89c920252787b2fc61b8db6be13e52394b29dd7e23de5e32631afaa50858a7a The change has been reverted, and subscribers have been restored to their correct status. There is no action required from authors. There was no impact to subscriber-facing features. Subscribers are mistakenly being marked as undeliverable. https://status.buttondown.com/incident/538844 Wed, 02 Apr 2025 17:45:00 -0000 https://status.buttondown.com/incident/538844#08ed25b66dbfa6df50c5c1c54d1e36932e151ef19138b9925705cd66626eae3b The change is still being reverted, but we expect that this will be complete shortly. Please continue to refrain from sending until the fix is complete. Subscribers are mistakenly being marked as undeliverable. https://status.buttondown.com/incident/538844 Wed, 02 Apr 2025 17:45:00 -0000 https://status.buttondown.com/incident/538844#08ed25b66dbfa6df50c5c1c54d1e36932e151ef19138b9925705cd66626eae3b The change is still being reverted, but we expect that this will be complete shortly. Please continue to refrain from sending until the fix is complete. Subscribers are mistakenly being marked as undeliverable. https://status.buttondown.com/incident/538844 Wed, 02 Apr 2025 17:15:00 -0000 https://status.buttondown.com/incident/538844#0256ffd8a94bc10f3d2bcfa6b293498a26e979eabd2d27e800b825f11412b279 The Buttondown dashboard and API are mistakenly marking subscribers as undeliverable. We're in the process of reverting the change. Subscribers are mistakenly being marked as undeliverable. https://status.buttondown.com/incident/538844 Wed, 02 Apr 2025 17:15:00 -0000 https://status.buttondown.com/incident/538844#0256ffd8a94bc10f3d2bcfa6b293498a26e979eabd2d27e800b825f11412b279 The Buttondown dashboard and API are mistakenly marking subscribers as undeliverable. We're in the process of reverting the change. Large backlog in webhooks and automations https://status.buttondown.com/incident/536193 Sat, 29 Mar 2025 00:18:00 -0000 https://status.buttondown.com/incident/536193#71531a41abee5355cf7f27b03549fb8ef5e593e946661fd272550ff8bebd40a1 We've processed the backlog. Large backlog in webhooks and automations https://status.buttondown.com/incident/536193 Fri, 28 Mar 2025 20:26:00 -0000 https://status.buttondown.com/incident/536193#2cd84cb381e65f6540b2f3eec09c3fc70c9311a4c4f7d670974a31ee1a582ff9 We've got a very large backlog of webhooks/automations that need processing. We're scaling up (a lot!) in order to do so; no actions required on your end. Increased spam rate for custom domains https://status.buttondown.com/incident/533622 Tue, 25 Mar 2025 18:51:00 -0000 https://status.buttondown.com/incident/533622#f0a2e8f3b366b9f490abeb29a18b135ec590afac850eefdd52d4fba558728f51 We've confirmed and resent the majority of our affected emails, and are in touch with authors with whom we haven't automatically resent on their behalf. Increased spam rate for custom domains https://status.buttondown.com/incident/533622 Tue, 25 Mar 2025 02:55:00 -0000 https://status.buttondown.com/incident/533622#c1a9b5d10aa84fd801d1c4fa194eb80916c30ebfbd1f63e3df20fd9dcfc8da5a Postmark has reported that [this incident](https://status.postmarkapp.com/notices/bt3ky3r8zlaapqlo-increased-gmail-spam-reports) is now resolved. Our team is continuing to test and monitor. Increased spam rate for custom domains https://status.buttondown.com/incident/533622 Mon, 24 Mar 2025 23:23:00 -0000 https://status.buttondown.com/incident/533622#668b308b22ba35b08a2942c55ac65d1b11b6c880266c1987d9eca2c43ae9c0c3 Postmark has implemented a fix for [the deliverability issue](https://status.postmarkapp.com/notices/bt3ky3r8zlaapqlo-increased-gmail-spam-reports) that caused their IPs to be flagged by Gmail and other providers. We're actively monitoring and will post an update once this is resolved. If you're sending from a custom domain, continue waiting to send large emails until this is fully resolved. Increased spam rate for custom domains https://status.buttondown.com/incident/533622 Mon, 24 Mar 2025 15:32:00 -0000 https://status.buttondown.com/incident/533622#1a14e0101392e76371081483300cafbf8b1fd6a2fb6807803e6a804e07de07b1 We're tracking [Postmark's incident](https://status.postmarkapp.com/notices/bt3ky3r8zlaapqlo-increased-gmail-spam-reports) that is causing a lot of their IPs to be flagged by Gmail and other providers. If you're sending from a custom domain, this likely impacts you; refrain from sending large emails unless you absolutely need to. Emails were timing out https://status.buttondown.com/incident/528333 Fri, 14 Mar 2025 18:31:00 -0000 https://status.buttondown.com/incident/528333#44aadb7b6deca2ce55d74475f840c68d53981a725d51944d2a2f469b8dc5e9cb Here is the not-so-fun thing about running an email service provider: you get malicious actors trying to use your infrastructure for, well, malice — phishing, spoofing, et cetera. We have a lot of defenses in place for this, but we detected someone with a relatively novel approach of trying to pass in problematic URLs which we weren't catching. Our other various systems _did_ catch and apprehend this user before they were able to send any emails, but in our haste to push forward a solution quickly we didn't backtest the performance of this solution — in particular, it added a very serious lag to particularly long emails, causing some of them to time out when you went to go and send them (either via the UI or the API.) Normally, when something like this happens we just roll back to stem the bleeding; because this was also a bit of a security/fraud issue, we opted to roll forward and try and fix it live. We are all set now, but apologies if you had issues sending over the last 24 hours; we take that critical path seriously, and hope you understand (and forgive) the interruption. Emails were timing out https://status.buttondown.com/incident/528333 Fri, 14 Mar 2025 18:31:00 -0000 https://status.buttondown.com/incident/528333#44aadb7b6deca2ce55d74475f840c68d53981a725d51944d2a2f469b8dc5e9cb Here is the not-so-fun thing about running an email service provider: you get malicious actors trying to use your infrastructure for, well, malice — phishing, spoofing, et cetera. We have a lot of defenses in place for this, but we detected someone with a relatively novel approach of trying to pass in problematic URLs which we weren't catching. Our other various systems _did_ catch and apprehend this user before they were able to send any emails, but in our haste to push forward a solution quickly we didn't backtest the performance of this solution — in particular, it added a very serious lag to particularly long emails, causing some of them to time out when you went to go and send them (either via the UI or the API.) Normally, when something like this happens we just roll back to stem the bleeding; because this was also a bit of a security/fraud issue, we opted to roll forward and try and fix it live. We are all set now, but apologies if you had issues sending over the last 24 hours; we take that critical path seriously, and hope you understand (and forgive) the interruption. Delays in background processing https://status.buttondown.com/incident/504518 Fri, 31 Jan 2025 20:52:00 -0000 https://status.buttondown.com/incident/504518#11206448c278e2dfdd6e83e111e726658aa27676e44f27a81af78c7e566804c1 # Post mortem ## What broke? `workerscheduler`, our process for running asynchronous jobs that are scheduled for some date in the future, was hard down for ~six hours. This meant, amongst other things: 1. Outbound emails were down 2. Cron was down 3. Other stuff, but those two dwarf everything else ## Why did it break? At a very high level, our asynchronous worker schedulers work something like this (none of this is bespoke, it's standard RQ): 1. To enqueue a job, serialize the method name, the arguments you want to pass to that method, and a timestamp — store that in Redis. 2. To find a job that should be worked on, pull all of the potential jobs. Sort those jobs by timestamp and start running the first one that is ready. Now, that _the arguments you want to pass_ thing is a bit of a landmine. Consider the following job that we might enqueue: ```python class Email: id: str subject: str body: str @job('five_minutes') def send_email(email: Email, recipients: list[str]): for recipient in recipients: send_email_to_recipient(email, recipient) ``` Enqueuing this means serializing a Python object that looks like: ``` { "method_name": "path.to.module.send_email", "arguments": [ { "class": "path.to.module.email", "id": "1", "subject": "Hi there!", "body": "How are you doing?" }, [ "penelope@buttondown.com", "telemachus@buttondown.com" ] ] } ``` (This is a simplified example, but it's directionally accurate.) Now, the tricky thing about this is that emails can get... large. Our `Email` object stores over sixty columns (four of which are some variation of "the fully rendered body for an email in different formats"), and some authors write _extremely length emails_. Some emails, in memory, are >5MB! Now, with the above toy example — we're sending a single email to a list of two recipients. In reality, Buttondown _batches_ the total recipient list for an email based on a number of factors: an author with 30,000 subscribers might have their email batched into groups of 100 subscribers each (for a total of 300 batches.) And this is where we get into tricky territory. 300 jobs, all containing a 5MB email... now we've suddenly added 1.5GB of memory to Redis, and also now we force the workerscheduler to deserialize 1.5GB of data just to figure out a single job to run. This is, in fact, exactly what happened: the Heroku dyno running the workerscheduler has a memory cap of 512MB, and once we got into this state it kept on trying to read the entire list of jobs, OOM-ing, restarting, ad infinitum. ## Okay, so don't store the entire thing in memory! Right! This is not rocket science. The correct approach is to do something like this: ```python class Email: id: str subject: str body: str @job('five_minutes') def send_email(email_id: str, recipients: list[str]): email = fetch_email_from_db(email_id) for recipient in recipients: send_email_to_recipient(email, recipient) ``` Which then means, in exchange for a slight performance hit (because now we're hitting the database to hydrate the email), all we have to serialize is this: ``` { "method_name": "path.to.module.send_email", "arguments": [ "1", [ "penelope@buttondown.com", "telemachus@buttondown.com" ] ] } ``` We do this almost everywhere... except one place — rate limiting logic per-domain. We happened to trigger _that one place_, and then we were stuck. ## How did we stop the bleeding? By clearing out all of the problematically large jobs from the safety (and large memort size) of my laptop: ```python import django_rq from rq.job import Job STRING_OF_JOB_TO_REMOVE = "send_email_to" QUEUE_NAME = "five_minutes" queue = django_rq.get_queue(QUEUE_NAME) jids = queue.scheduled_job_registry.get_job_ids() jobs = Job.fetch_many(jids, connection=django_rq.get_connection(QUEUE_NAME)) for job in jobs: print(job.description) if STRING_OF_JOB_TO_REMOVE in job.description: queue.scheduled_job_registry.remove(jid) print("Removing!") ``` Once we did that (and re-ran a bunch of stuff to get things flowing again), we were back to normal (albeit with a big backlog!) ## Why did it take so long to notice/fix? The shortest answer is: almost all of our observability runs on crons. If crons are broken, then we're flying blind. But that leads us to... ## How do we make sure this never happens again? - At an object level, we fixed that one problematic code path. We now load emails by ID there instead of serializing the entire object. - At a meta level, we've added a lot of observability through Better Stack so we're not dependent on our own rails. Most notably, we now get paged if no crons have been executed in five minutes: this would have immediately caught the problem. - At an observability level, I've added some internal tooling around analyzing the backlog. It didn't take us too long to diagnose the issue (~thirty minutes), but that's still not great. Delays in background processing https://status.buttondown.com/incident/504518 Thu, 30 Jan 2025 16:21:00 -0000 https://status.buttondown.com/incident/504518#aa34ca823374a3f659f84a3b1fc518acc0f467308a34509469baedcda6009371 We've finished redriving the stuck backlog items! We'll follow up with a postmortem later today. Delays in background processing https://status.buttondown.com/incident/504518 Thu, 30 Jan 2025 15:14:00 -0000 https://status.buttondown.com/incident/504518#0ab48241fbb0aee39aac6658bc52e965070d7443c96f8ae68c965bcd6296e8bc We have identified the cause and are working on a fix. In the meantime, background work that was stuck has started to be processed again (for example, emails stuck in "About to Send" are now sending). **Importantly, do not send your email a second time. Emails that were sent previously and are waiting will be sent out eventually.** Delays in background processing https://status.buttondown.com/incident/504518 Thu, 30 Jan 2025 09:22:00 -0000 https://status.buttondown.com/incident/504518#94e69cf57038bdf7c85dedc45debf01b6e07748d268783def1fba7e7c11d3415 Degradations to our background processing system are causing email sends, scheduled emails, and other background work to be delayed. Unable to send from the author-facing app https://status.buttondown.com/incident/502703 Mon, 27 Jan 2025 18:52:00 -0000 https://status.buttondown.com/incident/502703#7aee67838f20605854cc51e04d1007d0504aa9a6f5eec4f7f2a40847473cfd1b We have identified the root cause of this issue, and a fix has been deployed. Unable to send from the author-facing app https://status.buttondown.com/incident/502703 Mon, 27 Jan 2025 18:30:00 -0000 https://status.buttondown.com/incident/502703#e5e59d38c0d84a4a75f102281b623695873bfc42df78e6136b09f0591daf3293 We are currently investigating reports that some users are unable to send drafts or newsletters Author-facing app failing to load https://status.buttondown.com/incident/500977 Fri, 24 Jan 2025 01:15:00 -0000 https://status.buttondown.com/incident/500977#f070dac04ab80b40069a5ae02b4cfbe3dfb1197c08644d94f96772de829ffaff This issue has been identified, and the fix has been shipped. Author-facing app failing to load https://status.buttondown.com/incident/500977 Thu, 23 Jan 2025 22:30:00 -0000 https://status.buttondown.com/incident/500977#87c9274e9548356fc1e92d92f0c65dcf97a805bd4b5c74395605de58adeb73b6 We are currently investigating reports that the author-facing dashboard is failing to load for some customers Author-facing app failing to complete requests https://status.buttondown.com/incident/497693 Sat, 18 Jan 2025 02:26:00 -0000 https://status.buttondown.com/incident/497693#471a6fa81847d6fe06be8632c985720da105b768175f840304dcbe71fa3e64d8 As written above, we've since recovered from the incident. Author-facing app failing to complete requests https://status.buttondown.com/incident/497693 Sat, 18 Jan 2025 02:21:00 -0000 https://status.buttondown.com/incident/497693#3dfd39ee6bd5379934661b7548dd367d25583889ab8e2e009fba26b7181c0e27 From approximately 12:47pm EST to 2:15pm EST, most API requests coming from the author-facing app failed with a 400 error. This occurred because we rolled out a change to our routing and formatting of those requests; amongst other benign changes, we inadvertently changed the formatting of the HTTP method (GET, POST, etc.) of outgoing requests from uppercase to lowercase (get, post, etc.). This change was tested and passed both our continuous integration and manual testing; however, Heroku (our cloud infrastructure provider) does _not_ support this formatting for HTTP methods, and rejected all inbound requests. Once we discovered the issue, we rolled back the deployment and rolled forward a change to correctly capitalize those methods. Some tracked links may not be resolving https://status.buttondown.com/incident/461653 Thu, 14 Nov 2024 20:15:00 -0000 https://status.buttondown.com/incident/461653#48140a0b276496d4fcc9e25332accdb36dc7192c102279116ffa108375f8ade6 We're tracking reports of broken links for folks who have click tracking enabled. We've temporarily switched off click tracking for affected newsletters and are waiting to hear more from our upstream vendor.