A $58,000 lesson in mid-volume manufacturing
I love my job. After 11 years I still learn something new everyday. Most days it’s a fun experience, some days it’s painful. On Monday we learned we had shipped as many as 1,934 MicroView units without a bootloader. This renders the unit effectively broken.
To the folks that are receiving broken MicroViews over the next few days and weeks: We are really sorry to have messed up your first impressions of SparkFun, Geek Ammo and the awesome MicroView. We are going to make it right and will be shipping a replacement MicroView for every defective unit that was shipped. For those of you who backed the MicroView Kickstarter you will be contacted by September 12th to let you know if you were or were not shipped a defective unit. We are still trying to establish how many units are affected. The worst case scenario is 1,934 units need to be replaced outright. If you are part of the defective batch you will receive two units: one that has a broken bootloader (now and in the coming days) and one that works (by the beginning of November). If you’re willing to try, this is the perfect opportunity to learn some new skills: soldering and bootloader programming. Success means you’ll score a second working MicroView, free of charge.
To the folks that received MicroViews prior to July 18th and all backers of the Learning Kit tier, enjoy them. You should have a fully functional unit.
To the folks that just happen to be reading the SparkFun post today, let me tell you about a $58,000 lesson in mid-volume manufacturing.
If you’d rather use the Arduino IDE, follow all the steps to load a test sketch:
If you follow all these steps and get the error:
avrdude: stk500_recv(): programmer is not responding
avrdude: ser_recv(): programmer is not responding
avrdude: stk500_getsync() attempt 10 of 10: not in sync: resp=0x00
Then you know you’ve got a MicroView that is missing its bootloader.
You don’t need to contact us, we will contact you if we find you’re in the defective batch. If you’ve got questions, concerns, comments, please send them to firstname.lastname@example.org. We want to hear from you, whatever it is, but please give us at least a few days to respond.
We’ve ramped up to fix all the MicroView units we have in the building, build new MicroViews, and get you a replacement as quick as we can. We’re aiming for the end of October, early November. Again, we’re really sorry to have messed up. Please sit tight while we build more MicroViews.
Yes you can, but it’s not easy. In the next few weeks we will post a full tutorial to show you how to reprogram and recover a defective MicroView unit. In the mean time here’s a breakdown of what is required.
First you need a programmer capable of programming an ATmega. If you have an Arduino, you can use it as a programmer. If you want a cheap programmer in general the Pocket Programmer or the Tiny AVR Programmer are a great option.
Next, six connections are required for in-circuit serial programming (ISP). Three of the connections are located on external pins (easy to connect), and three of the connections are small vias on the internal PCB. This means the end user has to open the MicroView unit and attach (by soldering or holding) three wires to vias on the board, as well as attaching three wires to the exposed pins.
Next you need to run avrdude with this specific HEX file to burn the new firmware (that includes a bootloader!) onto the ATmega328. Once the firmware is loaded make sure you can upload sketches.
If everything is good, disconnect the ISP connections and carefully repack the MicroView into its housing and snap the lens back into place. Do a little dance, take a photo and tweet it @sparkfun because you are so awesome. We will high-five you back.
Over the past few months we have been building approximately 8,000 units for the wildly successful MicroView Kickstarter. On July 18th, 2014, a new production firmware was created to better test the units. Unfortunately this new test firmware was defective and didn’t include the STK500 (aka Optiboot) bootloader. The test procedure correctly tested the MicroView’s functionality (display graphics, toggled GPIOs, etc) but did not test the upload functionality. There is a reason for this: enumerating a COM port and uploading a sketch is much slower than pushing production firmware over SPI. Since 2011 we’ve been streamlining production by combining HEX files. This means we combine the HEX of the bootloader and the HEX of the production test code into one HEX file that gets programmed onto the final unit. We build nearly 90,000 products a month. This approach using combined HEX files has worked swimmingly for years. But you can see the Achilles' heel - if the HEX gets incorrectly formed it can be difficult to detect. On July 18th we started programming units with defective firmware and didn’t know it until August 17th, when customers started contacting us about the problem.
There are (worst case) 1,934 units that got programmed with test firmware but lack the STK500 (aka Optiboot) bootloader. The units are fully functional and should display the test sketch just fine. The problem is that there is no bootloader so you can’t upload a new sketch.
We build batches of MicroView 128 pieces at a time so there’s actually a large number of batch identifiers. We’re still trying to pin down the exact batch numbers (there’s a lot of them) and how many units are still in the building. We’ve been doing a great job of building and shipping units ahead of schedule but we have a mixture of batches still in the building. Once we get everything figured out we will notify the backers that are affected. You can easily test your unit following the description above.
1) No matter how much it costs, make it right with your users. 1,934 units * $30 for the replacement unit and shipping worldwide = $58,020. This sucks, and we screwed up, and we’re going to do everything we can to make it right. Sorry for messing up that moment of joy when you get your MicroView. We’ll get a replacement to you as quick as we can.
2) Don’t change production firmware mid-run. We’ve built hundreds of thousands of products, but in small batches. This is the first time that the stakes were so high. In the future, if we have to change the firmware we’ll send the new firmware through beta testers to make sure.
3) Test the bootloader. In the past we have not tested the bootloader because it required that we enumerate a COM port (which can take as long as 30 seconds) and send a test sketch (another 5-10 seconds). After the MicroView error was discovered we quickly agreed that we should be testing to see if a bootloader is present. This test can be done by resetting the ATmega, sending characters 0x30 0x20, and waiting for a 0x14 0x10 response. This test can be done with a second ATmega on the test jig itself (no computer required) and adds very little time after the programming step.
4) Moving from low volume to mid-volume production requires a very different approach. SparkFun has made this type of mistake before (faulty firmware on a device) but it was on a smaller scale and we were agile enough to fix the problem before it became too large. As we started producing very large production runs we did not realize quality control and testing would need very different thinking. This was a painful lesson to learn but these checks and balances are needed. If it didn’t happen on MicroView it would have happened on a larger production run someday in the future.
In 12 days SparkFun moves to its new building. I love my job: a new lesson every day.