Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The coolest instance of MILP I have encountered "in the wild" at work was a disaggregation problem where we could only receive vendor reports that contained aggregated statistics, each report aggregating the original data along a different dimension (say one by customer zip code, another by age range, etc). By lining up these aggregated reports, you could turn that into an MILP system that could be solved to unwind all of the aggregation and get back to the original data. This gave us way more flexibility in deciding how to use the vendor data.


Was the purpose of aggregating the data to "anonymize" it? If so, this just illustrates how useless anonymizing data is.


Yup, k-anonymity is hard. State of the art is rapidly evolving, requiring higher and higher k (i.e. more coarse/less useful reports). Most companies out there prefer to, um, not notice that.


Yes, and I mostly agree. Most companies just randomize an ID or do some group-bys and consider it "anonymized", which are susceptible to a variety of correlation techniques (such as setting up this MILP problem). And the more data sets you have with the same underlying masked primary key (e.g. person), the easier it can get. I have seen these weak attempts at hiding information in multiple industries. Just for the record, in my use case there was never an intent to map data to individuals, just to get the most granular information possible about key metrics.


Hello, would this have been on vendor reports from a certain large online advertising and search company?


Ha yes, I probably know you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: