{"id":153715,"date":"2007-11-01T00:00:00","date_gmt":"2007-11-01T00:00:00","guid":{"rendered":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/msr-research-item\/detection-of-web-subsites-concepts-algorithms-and-evaluation-issues\/"},"modified":"2018-10-16T20:16:38","modified_gmt":"2018-10-17T03:16:38","slug":"detection-of-web-subsites-concepts-algorithms-and-evaluation-issues","status":"publish","type":"msr-research-item","link":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/publication\/detection-of-web-subsites-concepts-algorithms-and-evaluation-issues\/","title":{"rendered":"Detection of Web Subsites: Concepts, Algorithms, and Evaluation Issues"},"content":{"rendered":"<div class=\"asset-content\">\n<p>Web sites are often organized into several regions, each dedicated to a specific topic or serving a particular function. From a user\u2019s perspective, these regions typically form coherent sets of pages characterized by a distinct navigation structure and page layout\u2014we refer to them as subsites. In this paper we propose to characterize Web site structure as a collection of subsites and devise a method for detecting subsites and entry points for subsite navigation. In our approach we use a new model for representing Web site structure called Link Structure Graph (LSG). The LSG captures a complete hyperlink structure of a Web site and models link associations reflected in the page layout. We analyze a sample of Web sites and compare the LSG based approach to commonly used statistics for Web graph analysis. We demonstrate that LSG approach reveals site properties that are beyond the reach of standard site models. Furthermore, we devise a method for evaluating the performance of subsite detection algorithms and provide evaluation guidelines.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Web sites are often organized into several regions, each dedicated to a specific topic or serving a particular function. From a user\u2019s perspective, these regions typically form coherent sets of pages characterized by a distinct navigation structure and page layout\u2014we refer to them as subsites. In this paper we propose to characterize Web site structure [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"Proceedings of the 2007 IEEE\/WIC\/ACM International Conference on Web Intelligence, WI 2007","msr_chapter":"","msr_edition":"Proceedings of the 2007 IEEE\/WIC\/ACM International Conference on Web Intelligence, WI 2007","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"Proceedings of the 2007 IEEE\/WIC\/ACM International Conference on Web Intelligence, WI 2007","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2007-11-01","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":2007,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13555],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-153715","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_publishername":"","msr_edition":"Proceedings of the 2007 IEEE\/WIC\/ACM International Conference on Web Intelligence, WI 2007","msr_affiliation":"","msr_published_date":"2007-11-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"Proceedings of the 2007 IEEE\/WIC\/ACM International Conference on Web Intelligence, WI 2007","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"226249","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"rodrigues07-DetectionSubsites.pdf","viewUrl":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-content\/uploads\/2007\/11\/rodrigues07-DetectionSubsites.pdf","id":226249,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":226249,"url":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-content\/uploads\/2007\/11\/rodrigues07-DetectionSubsites.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"eduarda","user_id":31717,"rest_url":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=eduarda"},{"type":"user_nicename","value":"natasamf","user_id":33057,"rest_url":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=natasamf"},{"type":"text","value":"Blaz Fortuna","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/153715","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/153715\/revisions"}],"predecessor-version":[{"id":525303,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/153715\/revisions\/525303"}],"wp:attachment":[{"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=153715"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=153715"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=153715"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=153715"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=153715"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=153715"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=153715"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=153715"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=153715"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=153715"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=153715"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=153715"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/new-cm-edgedigital.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=153715"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}