In my last blog, I shared some details about why food logging is one of my most important daily quantified self practices. About 5 years ago, I transitioned away from using a paper and pen to record the items I ate at each meal and began to keep an electronic food log in BodyMedia’s iPhone app. This past March, I bought a FitBit Charge HR (which I wrote about here) and chose MyFitnessPal (MFP) as my new food logging tool. I found a third party Chrome extension (MyFitnessPal Data Downloader) that helped export the MyFitnessPal PDF files as CSV that I could easily read into JMP and I wrote a JMP add-in to import and clean up that file. At the time, I liked this solution much better than the one I had developed for BodyMedia, which only permitted food log export as PDF in 4 week chunks and required regex parsing, concatenation and cleanup to obtain an analysis-ready file.
Unfortunately, after a recent Chrome update, the MFP Data Downloader extension was uninstalled and I could no longer find it in the Chrome app store. I reviewed the various possibilities for MFP export from Ernesto Ramirez’s blog on the QS Labs site, but most of the 3rd party tools only offered daily calorie or macro summary data and did not list the food item level information that I wanted. I knew this meant I would need to develop a new strategy for getting my detailed data out of MFP and into JMP.
Obtaining my full food log from MFP
There is a Food Diary page of the FOOD menu in MFP which shows food log data for the current day. I clicked on the green View Full Report (Printable) button near the bottom of this page, which let me select a date range and the details I wanted to export. I’m only interested in the contents of my Food Diary, so I unchecked the other boxes.
Next, I specified a range of dates covering the time I used MFP for my food log, and clicked the Change Report button. Now I could see each day’s food item list in a separate HTML table within the web page, which MFP refers to as a printable report. Foods are listed under headers for each meal. Since each day’s food items are stored in a separate table, when I tried JMP’s Internet open option for HTML tables, I got a separate JMP table for each day in my food log, but lacking date information in the file-not helpful.
On a side note, I could probably go back and log foods for past dates if I was ever interested in getting my historical data into electronic format at some point-something BodyMedia wouldn’t allow me to do.
By selecting all the text on the printable report page and pasting it into a JMP script window (which is essentially a text editor), I obtained the raw text from the web report page. I deleted all rows above the From: date row and did a test import into JMP using File > Import as Data in JMP. Clearly, the structure was not right, but I was relieved to see that the table was tab delimited between fields. BodyMedia’s PDF files had been space delimited between fields, and with spaces also embedded in item names, that had presented certain challenges. The default Import as Data options clearly had trouble with item names containing commas, so I knew there would be more work to do before I could import this table correctly.
if you’re not using JMP, you could paste your web page text into any text editor to get to this point.
Optimizing my text import
Now I began to tweak my import settings. I saved my text as a text file, clicked File > Open, and chose Data with preview. I noted the data start position as row 1, with no column headers. After the initial import, I opened and edited the Source script embedded in my JMP table to rename the columns to the values shown on row 7 and re-ran my import script. I could have manually changed the names of the columns, of course, but changing my Source script made the process more repeatable. Now my imported table looked like this.
Next, I added some new columns to the table-Date, Meal, Item Name, and Serving Size-which I planned to use to hold those pieces of information when parsed out of the plain text. I revisited the regular expression parsing code which I had used in my BodyMedia food log import project. Those files had started out as PDFs which I’d saved as text files, and the table structure for days and meals was similar. Date and meal values occurred just once for all items underneath the header rather than appearing on each item’s line like I needed them to. I also needed to remove daily totals and other extraneous information from the table.
I tweaked my regular expression matching loop for Date to pick up the first occurrence of date in the expected format and wrote a loop to propagate the current meal down the table until a new date was reached. Getting JMP to recognize MFP’s format as a date column presented a minor annoyance, through no fault of MFP. The MFP date format uses the full month’s name with spaces and a comma (e.g., March 29, 2015) which isn’t a supported date format in JMP. I took the easy way out and used Find/Replace to substitute short month names for full Month names and removed the spaces and commas during the regex extraction to change the format into one that that JMP does recognize (e.g., Mar282015). This became the input format for my new date column, although I changed the display format in my table to m/d/y. The regex I used for matching the MFP Date format was “^(January|February|March|April|May|June|July|August|September|October|November|December) ([0-9]|[0-9][0-9]), (20[0-9][0-9])(.*$)”
I changed my Meal regex loop to catch the first identifier for each meal (“^(Breakfast|AM Snack|Lunch|Dinner|Snacks)(.*$)”), then looped through all rows and extended the meal values down through all the food items till I reached the next meal. I then matched and deleted extra rows which were missing calorie information, or contained table header or daily total information I didn’t need. I also used find/replace to remove the g and mg embedded inside the macro- and micro-nutrient column values, and added units properties to those columns in my JMP data table to store that information. Once the g and mg characters were gone from the macro- and micronutrient values, I could change those column types from character to numeric.
I was almost done! But I still had to resolve another annoyance about the MFP food item name information. For some reason, MFP combines the item name and serving size information in their report table so that these two pieces of information are contained in the same column, but not in a well-delimited way. There is generally a comma or a dash between them, but there seemed to also be able to have a comma or dash inside the item name (e.g., Chicken, breast, roasted, meat only), so I could not use either character reliably for dividing the two pieces of information.
In the previous iteration of my import using the MyFitnessPal Data Downloader, I had left the portion size information and the calorie information for each item in the same column. Generally, I find serving size information is not all that useful since there is often a disparity between the listed weight for a serving size and the quantity listed in cups, tablespoons, etc. However, retaining serving size information with the food name meant that I had many more unique food item names than I should have, due to eating differing portion sizes of many food items over time. This made downstream steps like recoding food item names significantly more time-consuming when I downloaded new data.
I constructed a new regex that was able to split the name and portion apart into a separate column based on a pattern that matched a space, a comma and one or more digits (possibly followed by a period). With the completion of this step, I was finally able to get my last 6 months’ worth of item level MFP food log data into a JMP table with the structure I wanted.
Cleanup of my imported data table
But even once I had my data table in JMP, my work was not done. The variety of food items in the MFP database is a blessing, but also a curse. The high level of redundancy means that if you want to do anything meaningful with your data, it’s absolutely necessary to recode similar food item names to improve their consistency. Fortunately, this is a process that can be repeatable using the JMP Recode platform, which allows me to save and reload a script with the paired raw and cleaned values. I had already developed a set of Recode scripts from my previous iteration of MFP file import. One downside of this approach is that each time I import new data, my Recode scripts need to be updated to add cleaned names for new items that haven’t appeared in my log before. I would be happy if MFP would handle this cleanup step for me, but given the extensive database they manage and the ability of users to add custom items, I understand the process of consolidating item names would be quite difficult to manage.
However, for individual users trying to make sense of their datA, I truly believe that consolidating item names is an essential step to aggregate information across similar items. It’s almost impossible to choose the same name for an item consistently from a large database like MFP. I probably used about 10 different names for cooked chicken breast during a six-month period, not to mention the truly ridiculous number of variations on names for different chocolate types which appeared in my food log. I like detail as much as the next QS person but it’s important to draw the line somewhere! Recoding food items is definitely a bit of a judgment call, and the granularity you choose is up to you. I probably could lose some of the detail for my chocolate categories, but everything has their area of interest. Here’s a view of one section of my chocolate item list, showing how I grouped similar items into consolidated names.
Similarly, I have a Recode script that assigns foods into Primary Food Groups, and I update this for each batch of new items. This allows me to have a higher-level hierarchy for food item graphs and tables I might create.
Since I changed the format of the item name data to exclude quantity information, I had to spend some time cleaning up and changing my item name Recode script to match the format of my newly imported data table. Fortunately, I was able to use the regex I had already developed for processing my table to process my item recoding script, removing item quantity from the information to be matched, and using a JMP formula column to reconstruct that part of my script. I did run into some other challenges along the way most of which seem to be related to whitespace differences between how the Chrome extension exported my data and how I imported it as text. Most people have never gone down this recode path before, so those issues would not be a problem for you.
I was irritated to have to redo my MFP import process, but at least I ended up with a better data table and a cleaner script for future recoding at the end of it. My item name and quantity information have now been separated, though I will still have to update my recode script every time I import my data because new items appear in my food log almost daily. However, the scope of the problem will be greatly lessened by excluding the (often unique) amounts from the item name. I hope to eventually script all the steps in this process and replace my older MFP import add-in with a new one.
Like other companies in the food logging space, MFP is sitting in a treasure trove of information about what works and doesn’t work for people attempting to lose weight “in the wild.” I really like how companies like Jawbone are clearly mining their data for interesting insights and sharing them on their data science blog. Thus far, the MFP blog entries that I have seen appear to be more in the realm of personal stories, nutrition and coaching advice and little in the way of insight from user data. I hope that might change in the future.
MFP’s Director of Data Engineering and Data Science is Chul Lee and I found an interview here indicating that he is interested in many of the topics I have been focused on, like item cleanup (I noted the recent addition of “Verified” foods to the MFP database) and food categorization. It would be helpful to see MFP tackle food categories in a standardized way and surface that information through a data export feature for customers who want to analyze their data in external tools like I do. There are certainly published classification schemes for foods that they could choose to make use of, if they could match database items to those groupings in a useful way.
I also see that there are many customers on the MFP support pages asking for better export tools. A simple search of data export within the last year showed 11 pages of matches, including these results on the first page, several of which asked about Excel export.
I can only guess that the lack of attention to a CSV data export feature reflects other more pressing priorities for the MFP development team. Hopefully it doesn’t reflect the desire to keep customers within their app and web page, where the company derives money from ads.
It seems like providing a better CSV export tool would actually help alleviate a second class of requests for the MFP development team. Currently, MFP’s graphics options for visualizing food log data are extremely basic. I found plenty of requests for improvements to supported graphs when I searched for “bar chart” on the site. Many of these requests could be helped along by allowing customers to easily export their data and explore it in other graphing and analysis tools.
Now that I have imported my own data into JMP, I have full flexibility to create whatever type of visualization or analysis that I want. I often create tree maps that display my food items within primary food categories like the one below, based on the data I imported.
I think this kind of graph is extremely helpful. JMP offers filtering and grouping options that MFP would have a hard time supporting. Using my detailed food log data, I can recreate the same calories eaten by day or week bar chart that MFP offers, and even pair it with my treemap in a dashboard with a selection filter like the one below to visualize how the breakdown of my food categories changes over selected weeks of the year. It’s up to me to determine how I want to explore the data I have been collecting for over half a year.
I do understand that MFP is a free tool, and as a customer of freeware, I don’t have much leverage to encourage them to change their export tools to make my personal data more accessible. But I hope that MFP will decide to make all users’ paths to their data easier in the future, whether by improving their export mechanisms or opening up an API for partners who want to do it. I’ll continue to use MFP as a data entry tool for the time being, but I certainly will be tempted to explore other food logging options in the future if they offer easier data export routes.