{"id":667476,"date":"2020-06-16T11:11:57","date_gmt":"2020-06-16T18:11:57","guid":{"rendered":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/?post_type=msr-research-item&#038;p=667476"},"modified":"2020-06-16T11:11:57","modified_gmt":"2020-06-16T18:11:57","slug":"efficiently-learning-structured-distributions-from-untrusted-batches","status":"publish","type":"msr-research-item","link":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/publication\/efficiently-learning-structured-distributions-from-untrusted-batches\/","title":{"rendered":"Efficiently Learning Structured Distributions from Untrusted Batches"},"content":{"rendered":"<p>We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume\u00a0<span id=\"MathJax-Element-1-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-1\" class=\"math\"><span id=\"MathJax-Span-2\" class=\"mrow\"><span id=\"MathJax-Span-3\" class=\"mi\">m<\/span><\/span><\/span><\/span>\u00a0users, all of whom have samples from some underlying distribution\u00a0<span id=\"MathJax-Element-2-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-4\" class=\"math\"><span id=\"MathJax-Span-5\" class=\"mrow\"><span id=\"MathJax-Span-6\" class=\"mi\">p<\/span><\/span><\/span><\/span>\u00a0over\u00a0<span id=\"MathJax-Element-3-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-7\" class=\"math\"><span id=\"MathJax-Span-8\" class=\"mrow\"><span id=\"MathJax-Span-9\" class=\"mn\">1<\/span><span id=\"MathJax-Span-10\" class=\"mo\">,<\/span><span id=\"MathJax-Span-11\" class=\"mo\">\u2026<\/span><span id=\"MathJax-Span-12\" class=\"mo\">,<\/span><span id=\"MathJax-Span-13\" class=\"mi\">n<\/span><\/span><\/span><\/span>. Each user sends a batch of\u00a0<span id=\"MathJax-Element-4-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-14\" class=\"math\"><span id=\"MathJax-Span-15\" class=\"mrow\"><span id=\"MathJax-Span-16\" class=\"mi\">k<\/span><\/span><\/span><\/span>\u00a0i.i.d. samples from this distribution; however an\u00a0<span id=\"MathJax-Element-5-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-17\" class=\"math\"><span id=\"MathJax-Span-18\" class=\"mrow\"><span id=\"MathJax-Span-19\" class=\"mi\">\u03f5<\/span><\/span><\/span><\/span>-fraction of users are untrustworthy and can send adversarially chosen responses. The goal is then to learn\u00a0<span id=\"MathJax-Element-6-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-20\" class=\"math\"><span id=\"MathJax-Span-21\" class=\"mrow\"><span id=\"MathJax-Span-22\" class=\"mi\">p<\/span><\/span><\/span><\/span>\u00a0in total variation distance. When\u00a0<span id=\"MathJax-Element-7-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-23\" class=\"math\"><span id=\"MathJax-Span-24\" class=\"mrow\"><span id=\"MathJax-Span-25\" class=\"mi\">k<\/span><span id=\"MathJax-Span-26\" class=\"mo\">=<\/span><span id=\"MathJax-Span-27\" class=\"mn\">1<\/span><\/span><\/span><\/span>\u00a0this is the standard robust univariate density estimation setting and it is well-understood that\u00a0<span id=\"MathJax-Element-8-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-28\" class=\"math\"><span id=\"MathJax-Span-29\" class=\"mrow\"><span id=\"MathJax-Span-30\" class=\"mi\">\u03a9<\/span><span id=\"MathJax-Span-31\" class=\"mo\">(<\/span><span id=\"MathJax-Span-32\" class=\"mi\">\u03f5<\/span><span id=\"MathJax-Span-33\" class=\"mo\">)<\/span><\/span><\/span><\/span>\u00a0error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when\u00a0<span id=\"MathJax-Element-9-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-34\" class=\"math\"><span id=\"MathJax-Span-35\" class=\"mrow\"><span id=\"MathJax-Span-36\" class=\"mi\">k<\/span><\/span><\/span><\/span>\u00a0is large. Unfortunately, their algorithms run in time exponential in either\u00a0<span id=\"MathJax-Element-10-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-37\" class=\"math\"><span id=\"MathJax-Span-38\" class=\"mrow\"><span id=\"MathJax-Span-39\" class=\"mi\">n<\/span><\/span><\/span><\/span>\u00a0or\u00a0<span id=\"MathJax-Element-11-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-40\" class=\"math\"><span id=\"MathJax-Span-41\" class=\"mrow\"><span id=\"MathJax-Span-42\" class=\"mi\">k<\/span><\/span><\/span><\/span>.<\/p>\n<p>We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for this problem. Our approach is based on recent algorithms derived from the sum-of-squares hierarchy, in the context of high-dimensional robust estimation. We show that algorithms for learning from untrusted batches can also be cast in this framework, but by working with a more complicated set of test functions.<\/p>\n<p>It turns out this abstraction is quite powerful and can be generalized to incorporate additional problem specific constraints. Our second and main result is to show that this technology can be leveraged to build in prior knowledge about the shape of the distribution. Crucially, this allows us to reduce the sample complexity of learning from untrusted batches to polylogarithmic in\u00a0<span id=\"MathJax-Element-12-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-43\" class=\"math\"><span id=\"MathJax-Span-44\" class=\"mrow\"><span id=\"MathJax-Span-45\" class=\"mi\">n<\/span><\/span><\/span><\/span>\u00a0for most natural classes of distributions, which is important in many applications. To do so, we demonstrate that these sum-of-squares algorithms for robust mean estimation can be made to handle complex combinatorial constraints (e.g. those arising from VC theory), which may be of independent technical interest.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume\u00a0m\u00a0users, all of whom have samples from some underlying distribution\u00a0p\u00a0over\u00a01,\u2026,n. Each user sends a batch of\u00a0k\u00a0i.i.d. samples from this distribution; however an\u00a0\u03f5-fraction of users are untrustworthy and can send adversarially chosen responses. The goal is then to learn\u00a0p\u00a0in total [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"STOC 2020","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2020-6","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"http:\/\/acm-stoc.org\/stoc2020\/","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13561,13556],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-667476","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-algorithms","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2020-6","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/1911.02035","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Sitan Chen","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Jerry Li","user_id":38305,"rest_url":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jerry Li"},{"type":"text","value":"Ankur Moitra","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/667476","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/667476\/revisions"}],"predecessor-version":[{"id":667479,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/667476\/revisions\/667479"}],"wp:attachment":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=667476"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=667476"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=667476"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=667476"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=667476"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=667476"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=667476"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=667476"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=667476"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=667476"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=667476"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=667476"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=667476"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}