Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Mahesh Tiyyagura

Mahesh Tiyyagura, Hyderabad IN

Patent application numberDescriptionPublished
20090063538METHOD FOR NORMALIZING DYNAMIC URLS OF WEB PAGES THROUGH HIERARCHICAL ORGANIZATION OF URLS FROM A WEB SITE - Techniques are described for normalizing dynamic URLs using a hierarchical organization of a web site. Given web pages associated with a web site, an information extraction method is used to generate data structures that represent the content or structure of each of the web pages. These data structures are appended to the corresponding dynamic URLs. The modified URLs with the data structures are tokenized with the resulting tokens clustered to create a hierarchical organization. Nodes of the hierarchical organization may be merged based upon occurrence or patterns of content and structure. The merged hierarchical organization may then be pruned to remove irrelevant information and to reduce the memory footprint of the hierarchical organization. When a new dynamic URL is received, the new dynamic URL is matched to the hierarchical organization. Important parameters are taken into account and irrelevant information may be removed. Based upon the matching to the hierarchical organization, a normalized URL is returned.03-05-2009
20090157597REDUCTION OF ANNOTATIONS TO EXTRACT STRUCTURED WEB DATA - Document, such as web pages of a domain, are annotated to facilitate extracting structured information from the documents. The documents are clustered. Each cluster is such that the documents within that cluster are similar to each other at least with respect to a first threshold, such as according to a shingling metric, where the first threshold is an 8/8 shingling match. There is at least one overlap cluster, each overlap cluster including at least one of the plurality of clusters such that documents of the at least one cluster included in that overlap cluster are similar to each other at least with respect to a second threshold that is lower than the first threshold. A particular overlap cluster is designated, as is a particular cluster of the particular overlap cluster. For the particular designated cluster, an obtained annotation is transferred to other clusters included in the designated particular overlap cluster.06-18-2009
20090157607UNSUPERVISED DETECTION OF WEB PAGES CORRESPONDING TO A SIMILARITY CLASS - A method of detecting web pages belonging to at least one similarity class from a plurality of web pages includes determining clusters of the plurality of web pages based on characteristics of the content of the web pages. For each of the determined clusters, at least one metric is determined indicative of similarity among resource locators associated with the web pages of that cluster. A determination of web pages belonging to the at least one similarity class is based on the determined clusters and the determined similarity metrics.06-18-2009
20090171986TECHNIQUES FOR CONSTRUCTING SITEMAP OR HIERARCHICAL ORGANIZATION OF WEBPAGES OF A WEBSITE USING DECISION TREES - A decision tree may be determined that is a site map for a domain of web pages. A clustering of a plurality of web pages of a domain is determined, in an unsupervised fashion, based on content-related features of the plurality of web pages. Each determined cluster includes a plurality of web pages, each of the plurality of web pages characterized by a resource locator and each of the resource locators being characterized by at least one resource locator token. The clustering is processed to organize indications of the content-related features of the plurality of web pages into a decision tree characterized by a plurality of nodes, each node characterized by a feature and a value, the feature being at least one of the resource locator tokens and the value being a value of that resource locator token.07-02-2009
20090313127SYSTEM AND METHOD FOR USING CONTEXTUAL SECTIONS OF WEB PAGE CONTENT FOR SERVING ADVERTISEMENTS IN ONLINE ADVERTISING - An improved system and method for using contextual sections of web page content for serving advertisements in online advertising is provided. A publisher may use a tool to identify sections of a web page that represent content to be used in contextual advertising. When rendered by a web browser, content from marked sections may be extracted from the web page and sent to an advertisement server for selectively matching advertisements for display to a user. Features may be identified from the content sections and used to select advertisements matching the extracted content of the web page. In particular, the features identified from the content sections may be matched with features designated by advertisers for advertisements. Web page placements may be allocated for advertisements matching the extracted content, and the advertisements may be served for display with the web page.12-17-2009
20090319481FRAMEWORK FOR AGGREGATING INFORMATION OF WEB PAGES FROM A WEBSITE - The present invention is directed towards systems and methods for extending media annotations using collective knowledge. The method according to one embodiment of the present invention comprises receiving a plurality of content items and associated annotations. The method further normalizes the plurality of associated annotations and calculates pair frequencies for the plurality of associated annotations. The method then retrieves a plurality of alternative annotations and provides the plurality of alternative annotations.12-24-2009
20100161588UNSUPERVISED DETECTION OF WEB PAGES CORRESPONDING TO A SIMILARITY CLASS - A method of detecting web pages belonging to at least one similarity class from a plurality of web pages includes determining clusters of the plurality of web pages based on characteristics of the content of the web pages. For each of the determined clusters, at least one metric is determined indicative of similarity among resource locators associated with the web pages of that cluster. A determination of web pages belonging to the at least one similarity class is based on the determined clusters and the determined similarity metrics.06-24-2010

Mahesh Tiyyagura, Bangalore IN

Patent application numberDescriptionPublished
20090307256INVERTED INDICES IN INFORMATION EXTRACTION TO IMPROVE RECORDS EXTRACTED PER ANNOTATION - A method is provided for information extraction from among a multiplicity of documents each having a corresponding document object model (DOM) comprising: computing signatures associated with nodes of a multiplicity of DOMs corresponding to the multiplicity of documents; producing an index that associates computed signatures to each document that has a DOM that has one or more nodes corresponding to such signature; annotating one or more nodes of a DOM that corresponds to the at least one selected document; wherein the one or more annotated nodes respectively correspond to one or more respective signatures included in the index; and matching the signatures that correspond to the annotated nodes with signatures in the index to determine which documents from the multiplicity of documents have one or more DOM nodes that correspond to one or more of the annotated nodes.12-10-2009

Mahesh Tiyyagura, Andhra Pradesh IN

Patent application numberDescriptionPublished
20090240670UNIFORM RESOURCE IDENTIFIER ALIGNMENT - Subject matter disclosed herein may relate to alignment of uniform resource identifiers associated with web pages, and further may relate to multiple sequence alignment of uniform resource identifiers. In one or more example embodiments, multiple sequence alignment techniques may provide improved tokenization of uniform resource identifiers associated with web pages, which may provide improved performance of applications such as, for example, uniform resource identifier normalization, sitemap construction, etc.09-24-2009