{"id":162193,"date":"2012-03-01T00:00:00","date_gmt":"2012-03-01T00:00:00","guid":{"rendered":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/msr-research-item\/boosting-attribute-and-phone-estimation-accuracies-with-deep-neural-networks-for-detection-based-speech-recognition\/"},"modified":"2018-10-16T20:12:30","modified_gmt":"2018-10-17T03:12:30","slug":"boosting-attribute-and-phone-estimation-accuracies-with-deep-neural-networks-for-detection-based-speech-recognition","status":"publish","type":"msr-research-item","link":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/publication\/boosting-attribute-and-phone-estimation-accuracies-with-deep-neural-networks-for-detection-based-speech-recognition\/","title":{"rendered":"Boosting Attribute And Phone Estimation Accuracies With Deep Neural Networks For Detection-Based Speech Recognition"},"content":{"rendered":"<div class=\"asset-content\">\n<p>Generation of high-precision sub-phonetic attribute (also known as phonological  features)  and  phone  lattices  is  a  key  frontend component  for  detection-based  bottom-up  speech  recognition.  In this  paper  we  employ  deep  neural  networks  (DNNs)  to improve    detection  accuracy  over  conventional  shallow  MLPs (multi-layer perceptrons) with one hidden layer. A range of DNN architectures  with  five  to  seven  hidden  layers  and  up  to  2048 hidden  units  per  layer  have  been  explored.  Training  on  the  SI84 and testing on the Nov92 WSJ data, the proposed DNNs achieve significant  improvements  over  the  shallow  MLPs,  producing greater than 90% frame-level attribute estimation accuracies for all 21 attributes tested for the full system. On the phone detection task, we also obtain excellent frame-level accuracy of 86.6%. With this level  of  high-precision  detection  of  basic  speech  units  we  have opened  the  door  to  a  new  family  of  flexible  speech  recognition system  design  for  both  top-down  and  bottom-up,  lattice-based search strategies and knowledge integration.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech recognition. In this paper we employ deep neural networks (DNNs) to improve detection accuracy over conventional shallow MLPs (multi-layer perceptrons) with one hidden layer. A range of DNN architectures with five to seven [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"IEEE SPS","msr_publisher_other":"","msr_booktitle":"ICASSP 2012","msr_chapter":"","msr_edition":"ICASSP 2012","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"ICASSP 2012","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"Sabato Siniscalchi, Chin-Hui Lee","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2012-03-01","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":2012,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13554],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-162193","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_publishername":"IEEE SPS","msr_edition":"ICASSP 2012","msr_affiliation":"","msr_published_date":"2012-03-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"ICASSP 2012","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"206109","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"DNN-Detector-ICASSP2012.pdf","viewUrl":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/DNN-Detector-ICASSP2012.pdf","id":206109,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":206109,"url":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-content\/uploads\/2016\/02\/DNN-Detector-ICASSP2012.pdf"}],"msr-author-ordering":[{"type":"text","value":"Sabato Siniscalchi","user_id":0,"rest_url":false},{"type":"user_nicename","value":"deng","user_id":31602,"rest_url":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=deng"},{"type":"text","value":"Chin-Hui Lee","user_id":0,"rest_url":false},{"type":"user_nicename","value":"dongyu","user_id":31667,"rest_url":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=dongyu"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/162193","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/162193\/revisions"}],"predecessor-version":[{"id":524373,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/162193\/revisions\/524373"}],"wp:attachment":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=162193"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=162193"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=162193"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=162193"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=162193"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=162193"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=162193"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=162193"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=162193"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=162193"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=162193"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=162193"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=162193"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}