Patent application number | Description | Published |
20080263026 | Techniques for detecting duplicate web pages - Techniques are disclosed for detecting web pages with duplicate content. In one embodiment, a set of shingles is computed for each page of a group of pages. An aggregate set of shingles is determined based on the sets of shingles computed for the group of pages. A first subset from the aggregate set of shingles is determined by selecting, from the aggregate set, shingles whose frequencies in the aggregate set exceed a specified threshold. A modified set of shingles is generated for each page of the group of pages by removing, from the set of shingles for that page, any shingle included in the first subset. One or more duplicate pages in the group of pages are determined based at least in part on the modified sets of shingles generated for the group of pages. | 10-23-2008 |
20080275890 | System and method for smoothing hierarchical data using isotonic regression - An improved system and method is provided for detecting a web page template. A web page template detector may be provided for performing page-level template detection on a web page. In general, the web page template classifier may be trained using automatically generated training data, and then the web page template classifier may be applied to web pages to identify web page templates. A web page template may be detected by classifying segments of a web page as template structures, by assigning classification scores to the segments of the web page classified as template structures, and then by smoothing the classification scores assigned to the segments of the web page. Generalized isotonic regression may be applied for smoothing scores associated with the nodes of a hierarchy by minimizing an optimization function using dynamic programming. | 11-06-2008 |
20080275901 | System and method for detecting a web page - An improved system and method is provided for detecting a web page template. A web page template detector may be provided for performing page-level template detection on a web page. In general, the web page template classifier may be trained using automatically generated training data, and then the web page template classifier may be applied to web pages to identify web page templates. A web page template may be detected by classifying segments of a web page as template structures, by assigning classification scores to the segments of the web page classified as template structures, and then by smoothing the classification scores assigned to the segments of the web page. Generalized isotonic regression may be applied for smoothing scores associated with the nodes of a hierarchy by minimizing an optimization function using dynamic programming. | 11-06-2008 |
20090037447 | Mail Compression Scheme with Individual Message Decompressability - Embodiments of the present inversion relate to a two-pass compression scheme that achieves compression performance on par with existing methods while admitting individual message decompression. These methods provide both storage savings and lower end-user latency. They preserve the advantages of standard text compression in exploiting short-range similarities in data, while introducing a second step to take advantage of long-range similarities often present in certain types of structured data, e.g. email archival files. | 02-05-2009 |
20090077056 | CUSTOMIZATION OF SEARCH RESULTS - Methods and apparatus are described which enable the customization of search results. Various embodiments of the invention relate to machine-readable representations of configurations of one or more components of a search results page. The machine-readable representations are operable in conjunction with a search engine to present, in response to a search query, one or more search results in an interface in accordance with the corresponding configuration. | 03-19-2009 |
20090100381 | METHOD AND SYSTEM FOR CREATING SUPERIOR INFORMATIONAL GUIDES - A method for creating informational guides includes receiving a guide specification and a guide content for a plurality of guides; publishing the plurality of guides to a Web-based network for access to users of the network; serving advertising to the plurality of published guides; and rewarding owners of the plurality of published guides by providing compensation thereto based on revenue from the served advertising. | 04-16-2009 |
20090110199 | Toolbar Signature - A method and system are provided for a web browser toolbar signature. In one example, the method includes receiving a submission of user content from a source webpage, receiving a producer identity of a producer who submitted the user content, receiving identifying information about the destination webpage, coding signed content using the user content and the producer identity, wherein the signed content includes a signature, and submitting the signed content to a server hosting the destination webpage. | 04-30-2009 |
20090112865 | HIERARCHICAL STRUCTURE ENTROPY MEASUREMENT METHODS AND SYSTEMS - Methods and apparatuses are provided for accessing taxonomic data associated with an item as classified into a taxonomy having a hierarchical structure, establishing dependency data associated with a distribution represented in the taxonomic data, and determining entropic data for the item based, at least in part, on the distribution and established dependency. | 04-30-2009 |
20090112974 | COMMUNITY-BASED WEB FILTERING - Community-based rating information is generated about a Web site, Web page or other network-accessible content for use in Web filtering operations. The rating information may relate to the appropriateness of the content for a particular audience or audiences, such as for children or for children of different age groups. The rating information is based on feedback provided by users who have accessed the content in question. Where the group of users providing feedback is sufficiently large, the rating assigned to the content will tend to accurately reflect community standards. Also, because the rating information is based on user feedback, the rating information can change over time to reflect changing community attitudes towards content. | 04-30-2009 |
20090150497 | ELECTRONIC MAIL MESSAGE HANDLING AND PRESENTATION METHODS AND SYSTEMS - Methods and apparatuses are provided for use with electronic mail messages. In one exemplary method, electronic mail messages may be presented in an order based, at least in part, on a presentation scores associated with each message. The presentation score may be based, at least in part, on presentation knowledge information associated with an attribute profile. The attribute profile may, for example, be established and maintained based, at least in part, on non-selective user engagement parameters that may be determined based on a presentation of the electronic mail messages and/or identifiers associated therewith. | 06-11-2009 |
20090157646 | MITIGATION OF SEARCH ENGINE HIJACKING - The subject matter disclosed herein relates to mitigation of search engine hijacking. | 06-18-2009 |
20090157651 | Method and Apparatus for Detecting and Explaining Bursty Stream Events in Targeted Groups - A method and apparatus are provided for detecting and explaining bursty stream events in targeted groups. In one example, the method includes receiving validated bursty events, finding explanatory data sources having relevant bursty events that are relevant to the validated bursty events, wherein the explanatory sources explain the presence of the validated bursty events, correlating the validated bursty events to the relevant bursty events of the explanatory data sources to obtain burst results, and sending the burst results to a burst database that is accessible to an end user. | 06-18-2009 |
20090158249 | SYSTEM AND METHOD FOR TESTING A SOFTWARE MODULE - Systems and methods are described for testing a software module. The method comprises receiving a modified software module for use as part of a software application which includes a plurality of constituent software modules, replacing at least one of the constituent software modules with the modified software module to generate a modified software application, generating output data as a function of execution of the modified software application, and storing the output data. | 06-18-2009 |
20090164502 | SYSTEMS AND METHODS OF UNIVERSAL RESOURCE LOCATOR NORMALIZATION - Disclosed herein are method, systems and architectures for normalizing identifiers corresponding to resources using normalization rules that can be generalized for use with different resources. By way of a non-limiting example, an identifier can be a uniform resource locator (URL), and a normalization rule can be used to normalize URLs that correspond to different resources, e.g., content. A normalization rule can be generated by generalizing two or more normalization rules corresponding to different resources, such that a content determinative component is generalized. A normalization rule can be defined to include a context portion used to determine the rule's applicability to an identifier, and a transformation portion that identifies the transformations to be applied to an applicable identifier to yield a normalized form of the URL. A generalization of two or more normalization rules can include a normalization of one or both of the context and transformation portions. | 06-25-2009 |
20090216710 | OPTIMIZING QUERY REWRITES FOR KEYWORD-BASED ADVERTISING - A system and method are disclosed for rewriting queries. The queries may be rewritten and evaluated based on an end benefit, such as an optimum advertising benefit. Queries may be associated with advertisements and the benefit of those advertisements may be used in selecting query rewrites for an original user query. Multiple query rewrites from various techniques may be analyzed to generate a subset of query rewrites that are optimized for a particular benefit. | 08-27-2009 |
20090248608 | METHOD FOR SEGMENTING WEBPAGES - A method of segmenting a webpage into visually and semantically cohesive pieces uses an optimization problem on a weighted graph, where the weights reflect whether two nodes in the webpage's DOM tree should be placed together or apart in the segmentation; the weights are informed by manually labeled data. | 10-01-2009 |
20090319288 | SUGGESTING CONTACTS FOR SOCIAL NETWORKS - A social network is managed by applying connectivity and similarity measures to social network information to identify possible new relationships between social network users, and then automatically suggest those identified relationships to the social network users. The social network information can include user profile information and indicate existing social relationships between the users in the social network. Users can provide feedback regarding the suggestions, including indications whether the relationship was accepted, consummated, or declined. The social network information can be updated using the feedback. Similarity measures can be based on one or more of shared contacts, or common interests or activities, or content associated with social network users, or ratings within the social network of users and/or their content. Possible relationships having similarity measures that suggest the users likely to already know each other, can be omitted and not suggested. | 12-24-2009 |
20100049709 | Generating Succinct Titles for Web URLs - Methods, computer programs, and systems for generating a link title for a URL (Uniform Resource Locator) within a context webpage to be shown as a web result are provided. The method evaluates generation parameters for a plurality of sources for picking words from the link title. Further, the method generates candidates for the link title, and a likelihood is computed for each candidate. When computing the likelihood, the generation parameters, the context webpage and the words are considered. In addition, the method selects a candidate with the highest likelihood from all the computed likelihoods, and presents the URL with the selected candidate as the title. | 02-25-2010 |
20100077209 | GENERATING HARD INSTANCES OF CAPTCHAS - Methods and systems are described for enhancing the difficulty of captchas and enlarging a core of available captchas that are hard for an automated or robotic user to crack. | 03-25-2010 |
20100077210 | CAPTCHA IMAGE GENERATION - Methods and systems are described for generating captchas and enlarging a core of available captchas that are hard for an automated or robotic user to crack. | 03-25-2010 |
20100082607 | SYSTEM AND METHOD FOR AGGREGATING A LIST OF TOP RANKED OBJECTS FROM RANKED COMBINATION ATTRIBUTE LISTS USING AN EARLY TERMINATION ALGORITHM - An improved system and method for aggregating a list of top ranked objects from ranked combination lists using an early termination algorithm is provided. Ranked lists of individual object attributes may be aggregated into ranked lists of combination object attributes. The ranked lists of object attributes, including ranked lists of individual object attributes as well as ranked lists of combination object attributes, may be scanned in parallel. A fixed number of top scoring objects may be stored in a results list of top ranked objects. An upper bound of best possible aggregation scores of unseen object in the ranked lists of object attributes may be computed to incorporate the extra information given by the combination lists of attributes. If the upper bound computed is less than the score of top scoring objects in the results list, then the top scoring objects in the results list may be output. | 04-01-2010 |
20100125479 | MEETING SCHEDULER - A method of scheduling meetings includes: providing a first specification for a first set of meetings, wherein the first specification includes for each meeting an attendee list for specifying attendees, a duration for specifying meeting length, and a window for specifying acceptable meeting times; providing a first meeting schedule for the first set of meetings in accordance with the first specification, wherein the first meeting schedule includes a start time and an end time for each meeting; specifying an additional meeting to add to the first set of meetings, whereby a second set of meetings includes the first set of meetings and the additional meeting and a corresponding second specification includes for each meeting an attendee list, a duration, and a window; and determining a second meeting schedule for the second set of meetings by adjusting the first meeting schedule to include the additional meeting in accordance with the second specification. | 05-20-2010 |
20100128987 | METHOD AND APPARATUS FOR ORGANIZING DIGITAL PHOTOGRAPHS - A method and system for organizing digital photographs is disclosed. A plurality of digital photographs are obtained. Each digital photograph in the plurality is analyzed to obtain metadata related to the digital photograph and/or photograph content information related to the digital photograph. The metadata and/or the photograph content information is then analyzed. The plurality of digital photographs are automatically organized into clusters. Each cluster is associated with one or more predetermined cluster parameter. Each cluster parameter is associated with a metadata item. The digital photographs are then displayed (e.g., in a web page) on a computing device in accordance with the cluster parameter(s). One or more of the digital photographs are reorganized into a different cluster each time additional metadata related to the one or more digital photographs is received. | 05-27-2010 |
20100228745 | SYSTEM, METHOD, AND APPARATUS FOR SORTING AT LEAST PARTIALLY DYNAMIC DATA - Embodiments of methods, apparatuses, devices and systems associated with sorting candidate values are disclosed. | 09-09-2010 |
20100228804 | CONSTRUCTING IMAGE CAPTCHAS UTILIZING PRIVATE INFORMATION OF THE IMAGES - An image CAPTCHA having one or more images, a challenge, and a correct answer to the challenge is constructed by selecting the one or more images from a plurality of candidate images based at least in part on each image's public information and private information. The private information of each of the images is accessible only to an entity responsible for constructing the CAPTCHA. Optionally, the one or more images are selected further based on the specific type of the CAPTCHA to be constructed. | 09-09-2010 |
20100281104 | CREATING SECURE SOCIAL APPLICATIONS WITH EXTENSIBLE TYPES - A social environment is provided by creating an object in response to recognition of an entity in a portion of web content, wherein the object represents the entity, the object is associated with a type selected from a set of types, and the type is associated with a schema selected from a set of schemas, where the social environment includes a set of objects including the object, wherein the objects are instances of corresponding types in a rich system of predefined types, the schemas are associated with the types, metadata is associated with the objects, and there is at least one relationship between at least two objects selected from the set of objects, where the set of objects and the metadata are extensible, such that extensions provided by a first user are available for use by a second user. In one example, metadata provided by a first user is only available to a second user having a relationship with the first user. | 11-04-2010 |
20110099192 | Translation Model and Method for Matching Reviews to Objects - Disclosed are methods and apparatus for matching sets of text to objects are disclosed. In accordance with one embodiment, a set of text is obtained. For instance, the set of text may include a review. A numerical value is determined for each of a plurality of objects, where the numerical value indicates a likelihood that the corresponding one of the plurality of objects is a subject of the set of text. Each of the plurality of objects has an object type defined by a set of one or more attributes, each of the set of one or more attributes having associated therewith a corresponding set of one or more parameters, wherein the numerical value is determined using the set of text and a value of each of the set of one or more parameters for each of the set of one or more attributes. One of the plurality of objects that is most likely to be the subject of the set of text is identified based upon the numerical value that has been determined for each of the plurality of objects. | 04-28-2011 |
20120084832 | Time Managed Challenge-Response Test - A method of generating a time managed challenge-response test is presented. The method identifies a geometric shape having a volume and generates an entry object of the time managed challenge-response test. The entry object is overlaid onto the geometric shape, such that the entry object is distributed over a surface of the geometric shape, and a portion of the entry object is hidden at any point in time. The geometric shape is rotated, which reveals the portion of the entry object that is hidden. A display region on a display is identified for rendering the geometric shape and the geometric shape is presented in the display region of the display. | 04-05-2012 |
20130031059 | METHOD AND SYSTEM FOR FAST SIMILARITY COMPUTATION IN HIGH DIMENSIONAL SPACE - Method, system, and programs for computing similarity. Input data is first received from one or more data sources and then analyzed to obtain an input feature vector that characterizes the input data. An index is then generated based on the input feature vector and is used to archive the input data, where the value of the index is computed based on an improved Johnson-Lindenstrass transformation (FJLT) process. With the improved FJLT process, first, the sign of each feature in the input feature vector is randomly flipped to obtain a flipped vector. A Hadamard transformation is then applied to the flipped vector to obtain a transformed vector. An inner product between the transformed vector and a sparse vector is then computed to obtain a base vector, based on which the value of the index is determined. | 01-31-2013 |
20140164344 | METHOD AND APPARATUS FOR DETECTING AND EXPLAINING BURSTY STREAM EVENTS IN TARGETED GROUPS - A method and apparatus are provided for detecting and explaining bursty stream events in targeted groups. In one example, the method includes receiving validated bursty events, finding explanatory data sources having relevant bursty events that are relevant to the validated bursty events, wherein the explanatory sources explain the presence of the validated bursty events, correlating the validated bursty events to the relevant bursty events of the explanatory data sources to obtain burst results, and sending the burst results to a burst database that is accessible to an end user. | 06-12-2014 |