Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Krishna Leela Poola, Bangalore IN

Krishna Leela Poola, Bangalore IN

Patent application numberDescriptionPublished
20090049062Method for Organizing Structurally Similar Web Pages from a Web Site - Techniques are described for organizing structurally similar web pages for a website. Fingerprints are made of the structure of the web pages using shingling by placing the web page's HTML tags and attributes in sequence and encoding the tags and attributes using a standard encoding technique. Fixed-size portions of the encoded sequence are taken and a set of values extracted using independent hash functions to compute the shingles. Alternatively, a DOM tree representation of HTML of the web page is generated and each path of the DOM tree encoded and values extracted using independent hash functions to compute the shingles. A specified number of shingles are retained as the fingerprint. The pages are then clustered based upon the URL and the similarity of the shingles. The clustered hierarchal organization of pages is further pruned by various criteria including similarity of shingles or support of the cluster node in the hierarchy.02-19-2009
20090083266TECHNIQUES FOR TOKENIZING URLS - Techniques are described for tokenizing a corpus of URLs of web documents. URLs are first tokenized based upon specified generic delimiters to form components. The components are then tokenized using website-specific delimiters. Website-specific delimiters are any non-alphanumerical symbol or a unit change that is specific to a particular website. Support for website-specific delimiters and the tokens resulting from website-specific delimiters are calculated. Support values for website-specific delimiters and the tokens above a specified threshold value are valid. Tokenization may also be performed by generating a graph of the corpus of URLs of web documents. Each node of the graph represents a token and each edge represents a delimiter of the URLs. The graph is traversed and the support of the edges are compared to a specified threshold value. If the support of an edge of a node is greater, then the token corresponding to the node is valid.03-26-2009
20090089278TECHNIQUES FOR KEYWORD EXTRACTION FROM URLS USING STATISTICAL ANALYSIS - Techniques are described for keyword extraction from URLs using regular expression patterns and keyword ranking. Tokenization of URLs also generates regular expressions of URLs from a website. The regular expressions are stored in the form of any type of indexing structure. When a new URL is received, the URL is examined to determine whether the URL is from a website that has previously been tokenized. If the URL is not from such a website, then the URL is tokenized using every delimiter and unit change to extract keywords. If the URL is from a website previously processed, the corresponding regular expression is used to extract keywords from the URL. The keywords extracted from the URLs are then ranked based on any ranking methodology for better relevance and performance.04-02-2009
20090171986TECHNIQUES FOR CONSTRUCTING SITEMAP OR HIERARCHICAL ORGANIZATION OF WEBPAGES OF A WEBSITE USING DECISION TREES - A decision tree may be determined that is a site map for a domain of web pages. A clustering of a plurality of web pages of a domain is determined, in an unsupervised fashion, based on content-related features of the plurality of web pages. Each determined cluster includes a plurality of web pages, each of the plurality of web pages characterized by a resource locator and each of the resource locators being characterized by at least one resource locator token. The clustering is processed to organize indications of the content-related features of the plurality of web pages into a decision tree characterized by a plurality of nodes, each node characterized by a feature and a value, the feature being at least one of the resource locator tokens and the value being a value of that resource locator token.07-02-2009
20090240670UNIFORM RESOURCE IDENTIFIER ALIGNMENT - Subject matter disclosed herein may relate to alignment of uniform resource identifiers associated with web pages, and further may relate to multiple sequence alignment of uniform resource identifiers. In one or more example embodiments, multiple sequence alignment techniques may provide improved tokenization of uniform resource identifiers associated with web pages, which may provide improved performance of applications such as, for example, uniform resource identifier normalization, sitemap construction, etc.09-24-2009
20090259649SYSTEM AND METHOD FOR DETECTING TEMPLATES OF A WEBSITE USING HYPERLINK ANALYSIS - The present invention relates to methods, systems, and computer readable media comprising instructions for detecting templates within one or more web pages comprising a website. The method of the present invention comprises generating one or more groups of hyperlinks within a respective web page of the one or more web pages comprising the website. An in-link score is calculated for a given uniform resource locator associated with the one or more web pages comprising the website. The hyperlink groups in which the uniform resource locators associated with the one or more web pages comprising the website appear are identified. A template score is assigned to the identified hyperlinks groups on the basis of the in-link score associated with the uniform resource locators to which the hyperlinks comprising the hyperlink group correspond. The hyperlink groups with template scores exceeding a given template score threshold are thereafter identified as templates.10-15-2009
20090285378SYSTEM AND METHOD FOR OBFUSCATING CONTACT NUMBERS - A system for managing and tracking contacts between businesses and prospective customers, where the customer typically calls or sends a text message to the business on a mobile telephone. A business registered with the system is assigned a temporary, dynamic contact number (obfuscated number). A customer interacts with the business through the system using the obfuscated number; the customer's number is not revealed to the business. Usage of the obfuscated number is tracked and analyzed to yield marketing information for the business. Expiration of the number may be delayed if the parties continue to maintain contact.11-19-2009
20090319481FRAMEWORK FOR AGGREGATING INFORMATION OF WEB PAGES FROM A WEBSITE - The present invention is directed towards systems and methods for extending media annotations using collective knowledge. The method according to one embodiment of the present invention comprises receiving a plurality of content items and associated annotations. The method further normalizes the plurality of associated annotations and calculates pair frequencies for the plurality of associated annotations. The method then retrieves a plurality of alternative annotations and provides the plurality of alternative annotations.12-24-2009
20090327304SYSTEMS AND METHODS FOR TOKENIZING AND INTERPRETING UNIFORM RESOURCE LOCATORS - Aspects include methods, computer readable storing instructions for such methods, and systems for processing text strings such as URLs that comprise patterns of parameters and values for such parameters, delimited in a site-specific manner. Such aspects provide for accepting a number of text strings that are expected to have a common delimiting strategy, then deeply tokenizing those text strings to arrive at a set of tokens from which are selected anchor tokens used to form patterns having the anchor tokens separated by wildcard portions for recursive processing. The patterns formed can be mapped to a tree of nodes. Information concerning relationships between nodes and between tokens within a given node, as well as other heuristics concerning which tokens are parameters and which are values can be used as observed events for producing probabilities that certain tokens are parameters or values, using a dynamic programming algorithm, such as a Viterbi algorithm.12-31-2009
20100069096APPARATUS, METHOD, AND MANUFACTURE FOR MANAGING SCALABLE AND TRACEABLE EXCHANGES OF CONTENT BETWEEN ADVERTISERS AND PUBLISHERS FOR MOBILE DEVICES - A service exchange is provided. The service exchange receives text messages (such as SMS messages), such as search queries sent by a user to the service exchange, or a text message sent from the user to another user. The service exchange determines service providers most relevant to the user, and provides the information related to the most relevant service providers to the user, including, for each relevant service provider provided to the user, a dynamically assigned, obfuscated phone number for contacting the service provider.03-18-2010

Patent applications by Krishna Leela Poola, Bangalore IN