Sorry about data.sparkfun.com

A look back at what went wrong, why and how we fixed our issues with data.sparkfun.com

Favorited Favorite 0

Some of you may have noticed that we had trouble with data.sparkfun.com two weekends ago and into Tuesday the 14th. We are currently up and running. Basically our hard drives filled up. We currently have larger disks supporting the system and we are building out a new monitoring plan that should give us better visibility into the health and status of data.sparkfun.com.

alt text

Nate’s weather is updating fine!

The longer story, we weren’t monitoring disk usage, at least not in a meaningful way, and not in a way that would prevent the system from locking up. We use a multi-server architecture, but we use complete mongo replication, so the same data lives on all servers. While this helps fault tolerance if a web server or database goes down, it does nothing to prevent disk failures. In fact, it ensures that if one server goes down with a disk full error they all will, because they are currently the same size.

As we tried to restart and provision one of our VMs it got into a bad state. We probably let it cycle in that state too long. We eventually killed it, created a new VM, and threw it into the cluster. This was a fine solution but Mongo wanted to fully replicate before it would allow new connections to establish. This data replication took longer than expected.

Data.sparkfun.com is a mouth full to say as well as type, so from here on out I’m just going to call it “Data,” and hopefully the good people at Universal Studios don’t mind a loving tribute.

SparkFun has been reminded that we need to be good caretakers of Data. We need to keep an eye on it to ensure it keeps working for our users. Over the next few days and weeks we are going to plan and execute a series of small under-the-hood changes to ensure reliability and robustness for Data. We want you to have the same great user experience, we just need to be better at managing the streams that come in and fill those disks up. As we make these changes we plan to inform the user community about what we are doing.

Data is powered by phant, an open source IoT database that is built and maintained by SparkFun. It’s important to remember that phant was, and still is, operating just fine. It was an infrastructure problem surrounding our implementation and monitoring of core phant. Once we were able to stabilize our VMs, the system started right back up running at full strength. The team was about to get Data back up around around 11:00 AM MST Tuesday the 14th of July. Seven hours later we had had 381 streams of data updated with over 294,000 data pushes in that timeframe. That is a sign of a stable system being used by a lot of interested people.

alt text

Getting started on data is quick and easy

This is very exciting news for us at SparkFun, as we like seeing an engaged user community and the continued and expanded adoption of Data definitely qualifies. I’m very impressed by the Data user community. I’m glad you like our system, I’m glad you are using it and I hope it continues to be helpful.

Please keep using Data and we’ll do our best to keep it up for you in the future.


Comments 9 comments

  • System metrics is a religion. A good one too!

    Take a look at https://codeascraft.com/2011/02/15/measure-anything-measure-everything/ for some ideas for Data. I know that ideas from that one article saved my butt on multiple occasions!

  • Thank You for data.sparkfun.com! I don’t use it yet, but plan on using it when I replace my power-hungry XT machine logging data from my solar panels with a Beagle Bone and data.sparkfun.com. - Steve

  • Your GitHub repo is way out of date. You don’t even have the MongoDB interface in that rev of code.

    • We’ve had a lot of radio silence lately, but the repos are up to date. The phant project is modular and split across many repos, including one for mongo. The full repo for data.sparkfun.com, which uses mongo, is also available. We configured the base phant repo without a mongo dependency for the situations where mongo wasn’t available or needed.

    • Part of us paying more attention to Data will include getting the phant repo up-to-date. More coming in the future. Stay tuned!

  • Link for “Phant” is down.

Related Posts

Baby Blynk Monitor Thing

Recent Posts

Tags


All Tags