Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Pattern matching access

Subclass of:

707 - Data processing: database and file management or data structures

707001000 - DATABASE OR FILE ACCESSING

707003000 - Query processing (i.e., searching)

Patent class list (only not empty are listed)

Deeper subclasses:

Entries
DocumentTitleDate
20090171955METHODS AND SYSTEMS FOR IMPLEMENTING APPROXIMATE STRING MATCHING WITHIN A DATABASE - A computer-based method for character string matching of a candidate character string with a plurality of character string records stored in a database is described. The method includes a) identifying a set of reference character strings in the database, the reference character strings identified utilizing an optimization search for a set of dissimilar character strings, b) generating an n-gram representation for one of the reference character strings in the set of reference character strings, c) generating an n-gram representation for the candidate character string, d) determining a similarity between the n-gram representations, e) repeating steps b) and d) for the remaining reference character strings in the set of identified reference character strings, and f) indexing the candidate character string within the database based on the determined similarities between the n-gram representation of the candidate character string and the reference character strings in the identified set.07-02-2009
20100023514TOKENIZATION PLATFORM - A tokenization platform and method is described for accurately tokenizing character strings, including but not limited to non-delimited character strings of the type commonly used in Internet domain names and computer filenames, to accurately identify words and phrases occurring therein. In one embodiment, a phased tokenization approach is used in which the final phase is a lexical analysis-based tokenization using a dictionary. The dictionary may be advantageously created and updated based upon one or more query logs associated with respective information retrieval systems, thereby ensuring that the dictionary accurately reflects currently-used terminology and captures alternative spellings and presentations of words and phrases submitted by users.01-28-2010
20100017404Method and Apparatus to Elegantly and Automatically Track Emails and its Attachments for Enhanced User Convenience - An automated, embedded & intelligent E-mail Attachment Document manager automatically tracks email and their associated attachments and assists users in locating an email message (email-chain/thread) that is the original source from where the email attachment document. The present invention can perform the tracking action using the given name of the saved attachment file.01-21-2010
20090063482DATA MINING TECHNIQUES FOR ENHANCING ROUTING PROBLEMS SOLUTIONS - A computer method for enhancing routing problems solutions. The method includes the steps of providing a problem database comprising a compendium of problem history; providing a solution database comprising a compendium of at least one of routing problems solutions, routing information, and routing diagnostics; and, employing a data mining technique for interrogating the problem and solution databases for generating an output data stream, the output data stream correlating problem with solution.03-05-2009
20100057734MUSIC PROCESSING METHOD, MUSIC PROCESSING APPARATUS AND PROGRAM - There is provided a music processing method including the steps of: determining code values according to a volume transition in every certain period starting from a coding start position on a time axis in music data; and generating a characteristic pattern indicating a characteristic of a volume transition of the music data using a series of the code values determined over plural periods. The code value can be configured to indicate whether an average volume in a certain period is increased or decreased from an average volume in a previous period, for example.03-04-2010
20080306947Taxonomy editor - A repository 12-11-2008
20090193024Metadata Based Navigation Method - The embodiments of present invention provide a metadata based navigation method for web based online learning platform that facilitates intelligent interactions and knowledge sharing among the users of the portal with compatible profiles and compatible learning spaces. The communication of authorized user in the network is limited to the learning spaces they are attached to. The authorized user can communicate and interact with other user only if both the users are part of one or more learning spaces. This ensures the efficient interaction among the users. The identity of any user is based on the learning spaces.07-30-2009
20090193023DATA PICKER APPLICATION - A data picker system configured to allow a user to select data from a data site for submission to a data repository comprises an interface unit configured to provide a user with an interface for the user to input location information of a data site, a parsing unit configured to parse data in the data site and extract a data set from the data site, a selection mechanism configured to permit the user to select at least a portion of the extracted data set, and a processing unit configured to process at least a portion of the extracted data set selected by the user.07-30-2009
20090193022ASSOCIATIVE MEMORY - A computer-implemented method of realizing an associative memory capable of storing a set of documents and retrieving one or more stored documents similar to an inputted query document, said method comprising: coding each document or a part of it through a corresponding feature vector consisting of a series of bits which respectively code for the presence or absence of certain features in said document; arranging the feature vectors in a matrix; generating a query feature vector based on the query document and, according to the rules used for generating the feature vectors corresponding to the stored documents such that the query vector corresponds in its length to the width of the matrix; storing the matrix column-wise; for those columns of the matrix where the query vector indicates the presence of a feature, bitwise performing one or more of preferably hardware supported logical operations between the columns of the matrix to obtain one or more, additional result columns coding for a similarity measure between the query and parts or the whole of the stored documents; and said method further comprising one or a combination of the following: retrieval of one or more stores documents based on the obtained similarity measure; and or storing a representation of a document through its feature vector into the above matrix.07-30-2009
20090193021CAMERA SYSTEM AND METHOD FOR PICTURE SHARING BASED ON CAMERA PERSPECTIVE - An electronic device may transmit point-of-view information to a server that searches an image database for images that were taken from a corresponding point-of-view. Matching images may be displayed to a user of the electronic device and the user may be provided with an option to save one or more of the matching images in place of or in addition to a picture that was captured with the electronic device.07-30-2009
20100010996METHOD FOR THE ALLOCATION OF DATA ON PHYSICAL MEDIA BY A FILE SYSTEM THAT ELIMINATES DUPLICATE DATA - The present invention is a method for the allocation of data on physical media by a file system that eliminates duplicate data. Efficient searches are employed using a unique algorithm when a compare on hash is used to achieve realtime operation of the file system. The in memory feature of the invention allows the search to be performed in constant time. Also, the on disk representation of search structures enables the present invention to maintain these critical search structures in a highly efficient, self-consistent and resilient manner.01-14-2010
20090019041Filename Parser and Identifier of Alternative Sources for File - The filename for an unknown work, corresponding file contents, and information associated with the filename are parsed and analyzed and compared to information relating to one or more known works, which known works comprise at least broadcast television shows and other audio visual works. Information regarding the known works may be presented to at least one user, confirmation may be obtained, and alternative sources may be presented, from which alternative sources the known works may be obtained.01-15-2009
20100049711CONTENT-BASED MATCHING OF VIDEOS USING LOCAL SPATIO-TEMPORAL FINGERPRINTS - A computer implemented method computer implemented method for deriving a fingerprint from video data is disclosed, comprising the steps of receiving a plurality of frames from the video data; selecting at least one key frame from the plurality of frames, the at least one key frame being selected from two consecutive frames of the plurality of frames that exhibiting a maximal cumulative difference in at least one spatial feature of the two consecutive frames; detecting at least one 3D spatio-temporal feature within the at least one key frame; and encoding a spatio-temporal fingerprint based on mean luminance of the at least one 3D spatio-temporal feature. The least one spatial feature can be intensity. The at least one 3D spatio-temporal feature can be at least one Maximally Stable Volume (MSV). Also disclosed is a method for matching video data to a database containing a plurality of video fingerprints of the type described above, comprising the steps of calculating at least one fingerprint representing at least one query frame from the video data; indexing into the database using the at least one calculated fingerprint to find a set of candidate fingerprints; applying a score to each of the candidate fingerprints; selecting a subset of candidate fingerprints as proposed frames by rank ordering the candidate fingerprints; and attempting to match at least one fingerprint of at least one proposed frame based on a comparison of gradient-based descriptors associated with the at least one query frame and the at least one proposed frame.02-25-2010
20100049710System and method for optimized filtered data feeds to capture data and send to multiple destinations - There is provided a system and method for optimized filtered data feeds to capture data and send to multiple destinations. There is provided a system comprising a memory and a processor. The memory has a database associating data feed patterns to one or more of a plurality of destinations. The processor captures data from a data feed having a data feed destination, stores the data in the memory, compares the data feed with the data feed patterns in the database to determine matched patterns, retrieves one or more destinations associated with the matched patterns, and sends the data to the data feed destination and the retrieved destinations. There is also provided a system comprising data feed sources, destinations, a network connected to the data feed sources and the destinations, and a server configured to intercept and route network traffic on the network, the server including a memory and a processor.02-25-2010
20090187567SYSTEM AND METHOD FOR DETERMINING VALID CITATION PATTERNS IN ELECTRONIC DOCUMENTS - A system and method are provided for comparing portions of document text with potential citation components, determining if individual portions correspond to a citation component, and determining if a set of portions correspond to a valid citation pattern. A set of valid citation patterns is provided. Each citation pattern may include a specified combination of citation components. The invention further relates to identifying potential citation components from text in a document, analyzing a pattern of the identified citation components by comparing the pattern to a set of stored citation patterns to determine if the potential citation is a type of citation, and if so, is it a valid (and/or invalid) citation pattern. Once citation patterns have been determined in the document, annotations may be inserted into the document, and subsequent action may be taken, for example, generating a list of citations, providing research services, error-handling, and/or providing other options related to the citations.07-23-2009
20090240694TECHNIQUES FOR APPLICATION DATA SCRUBBING, REPORTING, AND ANALYSIS - Techniques for application data scrubbing, reporting, and analysis are presented. A plurality of data sources are analyzed in accordance with their schemas and matching rules. Merging rules are applied to merge a number of data types across the data sources together. A report is produced for inspection and a master data source is generated. The processing can be iterated with rules modified in response to the report for purposes of refining the master data source.09-24-2009
20100042623System and method for mining and tracking business documents - Systems and methods are described that mine and track archived business documents for discovering business knowledge and intelligence using data mining, machine learning, statistics, and computational linguistics, from different linguistic sources according to their meaning.02-18-2010
20090157681METHOD FOR PROCESSING DATA AND SYSTEM THEREOF - The present invention relates to a data processing method and system for checking an interactive communication sequence (ICS) relating to a plurality of users in a communication record by using a variable time window, and checking an interactive communication sequence pattern (ICSP) that is a frequently generated interactive communication sequence from among the checked interactive communication sequences. The data processing method includes: (a) storing an inverse pair in a communication record in an interactive communication sequence set or a candidate set that is a set of inverse pairs that can be part of the interactive communication sequence; (b) generating an interactive communication sequence having a length other than 1 by combining interactive communication sequences included in the interactive communication sequence set; and (c) generating an interactive communication sequence having a length other than 1 by combining the inverse pair included in the candidate set and one of the interactive communication sequence included in the interactive communication sequence set of (a) and the interactive communication sequence generated in (b).06-18-2009
20090157680System and method for creating metadata - A system and a method create metadata for media files. The metadata may be information relating to, based on and/or associated with the media files. The metadata of the media files may be searched by one or more terminals. An event database connectable to a terminal may use a location, a date and/or a time of creation of the media files to associate specific events with the media files. Further, the specific events may be used by the database to create keywords associated with the media files. As a result, the system and the method may organize and/or may provide searching for media files. A web page may be generated for an event that accumulates the media files related to the event.06-18-2009
20090157679Method and computer program product for analyzing documents - Disclosed methods and computer program products provide tools for analyzing documents. For example, a computer program product that is stored on a computer-readable storage medium includes instructions that, when executed, cause a computer system to perform certain steps. The steps include, for example, receiving a selection of two or more documents for comparison, the documents including numbered sections, and automatically determining matching numbered sections between the two or more documents. In one embodiment, the steps further include, in response to the determining, displaying in a graphical user interface a representation of a first set of numbered sections from a first of the documents and a representation of a second set of numbered sections from a second of the documents, and indicating, in the graphical user interface, matching numbered sections between the first set of numbered sections and the second set of numbered sections.06-18-2009
20090157678Content Based Load Balancer - A content based load balancing system receives a request for data provided by a resource. The content based load balancing system searches a content history cache for a content history cache entry corresponding to the requested data. The content based history cache then selects a resource node to service the request based on the content history cache entry corresponding to the data.06-18-2009
20090157677METHOD AND SYSTEM FOR ENABLEMENT OF SOCIAL NETWORKING BASED ON ASSET OWNERSHIP - A skill finder service receives directly from a user or through a social networking service a request for help with an asset. The skill finder service finds users skilled in the asset from asset data collected by an asset management service. The skill finder service searches the asset data for assets that match the asset in the request, and for each matching asset, determines the user who owns the matching asset. The skill finder service creates a list of skilled users and sends the list to the requesting user. The requesting user can contact one or more of the skilled users for help with the asset. The skill finder service uses existing data typically obtained by asset data management services to provide a valuable service to users. Further, by using the asset data to find skilled users, the skill finder service lessons or eliminates the need for user-entered expertise listing.06-18-2009
20090157675Method and System for Processing Fraud Notifications - Methods and systems for processing fraud notifications allow an organization to classify, monitor, and shut down fraudulent websites. A system may receive reports of suspicious network sites via electronic mail, and parse such reports in order to obtain one or more attributes (e.g., an address) corresponding to the suspicious network sites. In addition, information related to these suspicious network sites may be stored in a database, and algorithms may be used in order to classify, monitor, and respond to a particular suspicious network site. Before responding to a suspicious network site, such a website may first be classified as legitimate, fraudulent or ignore. If the suspicious network site is classified as legitimate or ignore, further action might not be needed. If, however, the suspicious network site is classified as fraudulent, the fraudulent website may be monitored and further action may be taken.06-18-2009
20090157674DEVICE LEVEL PERFORMANCE MONITORING AND ANALYSIS - Methods relating to device level performance monitoring and analysis include accessing device data including a listing of network devices, with each network device having a device profile. The methods further include accessing service reports, with each report related to a service issue. A set of devices profiles from the listing of network devices may be identified based on a particular device being associated with a particular service issue. The method further includes determining a set of problem prone device profiles from the set of devices profiles. A system including at least one database and a report processor may be configured to associate device data with service reports, identify service issue trends from service reports, identify at least one at risk device susceptible to future service issues related to service issue trends, and provide preventative support to the identified at least one at risk device.06-18-2009
20090157673CONDITIONAL STRING SEARCH - A method and a system for efficient search of string patterns characterized by positional relationships in a character stream are disclosed. The method is based on grouping string patterns of a dictionary into at least two string sets and performing string search processes of a text of the character stream based on individual string sets with the outcome of a search process influencing a subsequent search process. A system implementing the method comprises a dictionary processor for generating string sets with corresponding text actions and search actions, a conditional search engine for locating string patterns belonging to at least one string set in a text according to a current search state, a text operator for producing an output text according to search results, and a search operator for determining a subsequent search state.06-18-2009
20090177656TECHNIQUES FOR EVALUATING PATENT IMPACTS - Techniques for evaluating patent impacts are provided. A claim of a patent is normalized and an abstract of the claim is generated. The abstract is used to search a repository of target sources and their corresponding abstracts. Related abstracts found during the search are returned for purposes of evaluating the claim in view of data sources associated with the related abstracts.07-09-2009
20090043769KEYWORD EXTRACTION METHOD - To provide a technology that facilitates a Web access based on information (search keyword) in an advertisement by extracting a search keyword from an image simulating a search box of a search engine and making a search with this search keyword. Image information is acquired, the image information is analyzed, a simulated search box area corresponding to a predetermined pattern simulating a search box is specified, and a search keyword is extracted from the simulated search box area.02-12-2009
20090043768 METHOD FOR DIFFERENTIATING STATES OF N MACHINES - A differentiating system and method for differentiating states of N machines computes and stores differences between N machine states. The differentiating system takes as input a list of item keys and data for items of two or more states and produces as output a list of the item keys of items that are different between the N machine states, and the reason for the differences. Additionally, the differentiating system does not require knowledge of the item data contained in the N states.02-12-2009
20090043767Approach For Application-Specific Duplicate Detection - Techniques are provided for extracting view data from documents, where the data corresponds to an application-specific view and includes a plurality of components. Component data is identified within the view data and a view signature is generated for the view data that includes component signatures generated for each of the components on which the view data is comprised. Each component signature is generated based on the component data that corresponds to each component. The signatures generated are used to detect duplicates among the documents.02-12-2009
20090043766METHODS AND FRAMEWORK FOR CONSTRAINT-BASED ACTIVITY MINING (CMAP) - A method of mining data to discover activity patterns within the data is described. The method includes receiving data to be mined from at least one data source, determining which of a number of specified interests and constraints are associated with the mining process, selecting corresponding mining agents that combine search algorithms with propagators from the specified constraints, and finding any activity patterns that meet the specified interests and constraints.02-12-2009
20090043765SERVER AUTHENTICATION - A method of authenticating a content-provider server, the method comprising: determining a domain name of the content-provider server; obtaining a fragment of a database of IP addresses, the fragment corresponding to the domain name of the content-provider server and storing one or more IP addresses associated with the domain name; comparing the IP address of the content-provider server against the IP addresses of the fragment; and providing an indication that the IP address of the content-provider server is included or excluded from the fragment of IP addresses. Additionally, a client computer and server operable to implement the method are described.02-12-2009
20090006395SHAPE RECOGNITION METHODS AND SYSTEMS FOR SEARCHING MOLECULAR DATABASES - The present disclosure presents novel shape comparison methods. Methods for determining shape similarity between a query molecule and a target molecule and methods for screening one or more molecules in a database based on shape similarity to a query molecule are described.01-01-2009
20090307221METHOD, SYSTEM AND COMPUTER PROGRAMING FOR MAINTAINING BOOKMARKS UP-TO DATE - A solution is proposed for facilitating accessing resources of a data processing system with distributed architecture by a data processing entity of the system (with each resource that is accessible via a corresponding address). A set of bookmarks are provided for corresponding resources. Each bookmark is associated with a stored address of the corresponding resource—for accessing the corresponding resource in response to a selection of the bookmark. In the solution according to an embodiment of the invention, a signature identifying the corresponding resource is associated with each bookmark. Each bookmark is updated by verifying accessibility of the resource at the stored address matches the signature. The resource matching the signature is then located via a search engine in response to the non-accessibility of the resource. It is then possible to replace the stored address with a new address of the located resource.12-10-2009
20090307220IMAGE SEARCH ENGINE EMPLOYING IMAGE CORRELATION - An Internet infrastructure that supports searching of images by correlating a search image with that of plurality of images hosted in Internet based servers, containing an image search server, and a web browser contained in a client device that supports displaying of the images. The image search server supports delivery of search result pages to a client device based upon a search string or search image, and contains images from a plurality of Internet based web hosting servers. The image search server delivers a search result page containing images upon receiving a search string and/or search image from the web browser. The selection of images in the search result page is based upon: (i) word match, that is, by selecting images, titles of which correspond to the search string; and (ii) image correlation, that is, by selecting images, image characteristics of which correlates to that of search image. The selection of images in the search result page also occurs on the basis of popularity.12-10-2009
20090307219IMAGE SEARCH ENGINE USING IMAGE ANALYSIS AND CATEGORIZATION - An Internet infrastructure that supports searching of images by correlating a search image with that of plurality of images hosted in Internet based servers in selected categories, containing an image search server, and a web browser contained in a client device that supports displaying of the images. The image search server supports delivery of search result pages to a client device based upon a search string or search image, and contains images from a plurality of Internet based web hosting servers. The image search server also delivers characteristic analysis of an image to the client device upon request. The selection of images is done within selected categories and is based upon: (i) word match, that is, by selecting images, titles of which correspond to the search string; and (ii) image correlation, that is, by selecting images, image characteristics of which correlates to that of search image. The selection of images in the search result page also occurs on the basis of popularity. The search image server also selects category based upon user's choice.12-10-2009
20090307218ASSOCIATIVE MEMORY AND DATA SEARCHING SYSTEM AND METHOD - A method for searching a database (12-10-2009
20090094239Systems and Methods for Interaction Between Employers and Professional Recruiters - An automated system with multiple databases and communication methods designed to promote the sharing and transfer of useful information and data between two parties. Specifically, the system is designed for use by employers to communicate more effectively with professional recruiters in order to dramatically increase the effectiveness of matching job candidates with job requisitions. The system creates an online virtual marketplace for employers and professional recruiters to interact and conduct all phases of the recruiting process from start to finish, and thus replaces the need for other forms for recruitment. The two primary users of this system are employers and Recruiters, but there is also some interaction with the system from other entities such as job candidates. For professional recruiters the system allows for the personal tracking, search, storage, matching, and submission of job candidate information and job candidate resumes, and the ability to search, view, track, match, and submit resumes against open job requisitions nationwide. For employers, the system provides a method to post detailed job descriptions, select recruiters to work with, determine recruiting fees, receive and process resumes, coordinate and track the interview process, schedule interviews and employment testing, and receive automated invoices for successful placements.04-09-2009
20090271407INTEGRATED HANDHELD COMPUTING AND TELEPHONY SYSTEM AND SERVICES - Disclosed is an integrated handheld computer and telephony system. Integration of the handheld computer and telephony system is at the physical and operational level. For example, the integrated handheld computer and telephony system physically integrates a handheld computer with a mobile (e.g., cellular) telephone. In addition, the handheld computer is distinct from telephony system in that they are logically separable. However, they are also operationally integrated, for example, the telephony system executes a telephone application on the processor of the handheld computer. Likewise, the handheld computer can execute applications, for example, a phone book, that can be used to launch the telephony application.10-29-2009
20090271406SEQUENTIAL PATTERN DATA MINING AND VISUALIZATION - One or more processors (10-29-2009
20090271405STATISTICAL RECORD LINKAGE CALIBRATION FOR REFLEXIVE, SYMMETRIC AND TRANSITIVE DISTANCE MEASURES AT THE FIELD AND FIELD VALUE LEVELS WITHOUT THE NEED FOR HUMAN INTERACTION - Disclosed is a system for, and method of, calculating parameters used to determine whether records and entity representations should be linked. The system and method use a symmetric, transitive and reflexive function to allow for linking records and entity representations whose field values differ. The system and method apply iterative techniques such that parameters from each linking iteration are used in the next linking iteration. The system and method need no human interaction in order to calibrate and utilize record matching formulas used for the linking decisions.10-29-2009
20090271404STATISTICAL RECORD LINKAGE CALIBRATION FOR INTERDEPENDENT FIELDS WITHOUT THE NEED FOR HUMAN INTERACTION - Disclosed is a system for, and method of, calculating parameters used to determine whether records and entity representations should be linked. The system and method take into consideration interdependent fields, e.g., fields whose constituent field values may be positively or negatively correlated. The system and method apply iterative techniques such that parameters from each linking iteration are used in the next linking iteration. The system and method need no human interaction in order to calibrate and utilize record matching formulas used for the linking decisions.10-29-2009
20090271403INFORMATION PROCESSING APPARATUS AND PRESENTING METHOD OF RELATED ITEMS - The present invention provides an information processing apparatus including; a first similarity degree calculating section for calculating the degree of similarity between a predetermined item and a calculation target item: and a related item determining section for determining a predetermined quantity of the calculation target items as the related items of the predetermined items in the descending order of the degree of similarity, wherein meta information related to the calculation target item based on the behavior history of user is added to the predetermined items and the degree of similarity between the predetermined item and the calculation target item is calculated.10-29-2009
20090271402Deduplication of Data on Disk Devices Based on a Threshold Number of Sequential Blocks - Deduplication of data on disk devices based on a threshold number (THN) of sequential blocks is described herein, the threshold number being two or greater. Deduplication may be performed when a series of THN or more received blocks (THN series) match a sequence of THN or more stored blocks (THN sequence), whereby a sequence comprises blocks stored on the same track of a disk device. Deduplication may be performed using a block-comparison mechanism comprising metadata entries of stored blocks and a mapping mechanism containing mappings of deduplicated blocks to their matching blocks. The mapping mechanism may be used to perform later read requests received for the deduplicated blocks. The deduplication described herein may reduce the read latency as the number of seeks between tracks may be reduced. Also, when a seek to a different track is performed, the seek time cost is spread over THN or more blocks.10-29-2009
20090030905Method And System For Providing Links To Resources Related To A Specified Resource - The present invention is related to a computer-implemented method and system for providing links to one or more resources related to a specified resource. The method according to the present invention includes allowing a user to configure a relation comprising a matching criteria for the resource, associating the relation with the specified resource, and processing the relation to create a relation set comprising the links to the one or more related resources satisfying the matching criteria.01-29-2009
20090030904Method for the Approximate Matching of Regular Expressions, in Particular for Generating Intervention Workflows in a Telecommunication Network - A list of elements in a set of elements is matched by means of regular expressions that define respective groups of elements in the set by approximately matching by means of the regular expressions the list of elements by locating recurrences of the regular expressions in the list of elements with a maximum number of matching errors. The matching errors correspond to insertions deriving from the superposition of groups of elements related to different regular expressions. Each time the recurrence of one regular expression is located in the list, the group of elements defined by the regular expression thus located is removed from the list, while leaving in the list those elements corresponding to errors. The approximate matching can be performed by representing each regular expression in terms of Glushkov automata. The method is applicable, e.g., for generating workflows related to interventions on equipment such as equipment included in a telecommunication network or to attacks attempted against such equipment.01-29-2009
20090030901SYSTEMS AND METHODS FOR FAX BASED DIRECTED COMMUNICATIONS - Various embodiments of the present invention provide systems and methods for responding to business related queries. As one example, such methods may include providing a communication direction associated with a particular business, and receiving a query via the communication direction. The received query is directed to a third party support service where it is parsed and one or more elements of the query are compared against a prior query. A response to the query was previously supplied by the particular business. A response is provided to the query that includes at least a portion of the reply to the prior query.01-29-2009
20090234855Systems and Methods for Efficient Data Searching, Storage and Reduction - Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.09-17-2009
20090234854SEARCH SYSTEM AND SEARCH METHOD FOR SPEECH DATABASE - An acoustic feature representing speech data provided with meta data is extracted. Next, a group of acoustic features which are extracted only from the speech data containing a specific word in the meta data and not from the other speech data is extracted from obtained sub-groups of acoustic features. The word and the extracted group of acoustic features are associated with each other to be stored. When there is a search key matching the word in the input search keys, the group of acoustic features corresponding to the word is output. Accordingly, the efforts of a user for inputting a key when the user searches for speech data are reduced.09-17-2009
20090234853Finding the website of a business using the business name - A system and method are provided for augmenting information on business directory databases. Using the business name contained in a business directory database and Web data mining technology, the website of a business is found and validated, prior to enriching the database entries.09-17-2009
20090234852SUB-LINEAR APPROXIMATE STRING MATCH - Computerized search problems can be performed more quickly, efficiently and effectively by utilizing a database of potential matching items and associated similar items which are grouped, or otherwise related, by their distance, measured in change, from their respective potential matching item. An input item requiring a search for a match and, if necessary, one or more similar input items generated by making a change to the input item are compared with sub-linear effort to the database. In this manner, matches in the database within an acceptable distance, measured in change, can be quickly and effectively identified for an input item.09-17-2009
20090234851Browser Use of Directory Listing for Predictive Type-Ahead - A system and method for providing a predictive browser type-ahead that performs server queries of computer file directory listings in order to locate and present matching Universal Resources Locator (URL) extensions as the URL is entered into the browser. The predictive type-ahead provides matching URL entries into the browser for user selection. The predictive type-ahead also continually validates the browser entries as they are made.09-17-2009
20090234850SYNCHRONIZATION OF METADATA - A system to synchronize metadata for a plurality of applications. The system includes content administration rules programmed to define policies for updating metadata in the master database and policies for propagating updates in the metadata to the plurality of applications. The metadata describes at least one asset represented as data residing in at least one of the plurality of applications. A rules engine is programmed to apply at least a first set of the content administration rules to a metadata record received from a first application of the plurality of applications to control updating corresponding metadata stored in a master database. Changes in the corresponding metadata made to the master database can be propagated to at least one second application of the plurality of applications according to a second set of the content administration rules predefined for each of the at least one second application.09-17-2009
20090234849Streaming Faceted Search - Systems and methods for streaming faceted search are provided. In accordance with one embodiment, an exemplary method comprises receiving a search query; processing the search query to find matching search results; providing the search results for display, in response to finding matching search results; and generating first facet information corresponding to the search results, wherein the first facet information is displayed as second facet information continues to be generated.09-17-2009
20090024622IMPLEMENTATION OF STREAM ALGEBRA OVER CLASS INSTANCES - Creating and executing a distributed stream processing operator graph based on a query. The operator graph includes movable stream algebra operators for processing events received from high volume data streams. The operators are partially compiled and distributed to computing devices for completion of the compilation and subsequent execution. During execution, the operators maintain minimal state information associated with received events via an expiration time assigned to each of the event instances. Additional events are generated and aggregated by the operators for communication to a service responsible for the query.01-22-2009
20080306944METHODS, SYSTEMS AND COMPUTER PROGRAM PRODUCTS FOR ANALOGY DETECTION AMONG ENTITIES USING RECIPROCAL SIMILARITY MEASURES - Analogies among entities may be detected by obtaining associative counts among the entities and computing similarity measures among given entities and other entities, using the associative counts. First and second entities are then identified as being analogies if the first entity has a strongest similarity measure with respect to the second entity and the second entity also has a strongest similarity measure with respect to the first entity. The similarity measures may be calculated using a normalized entropy inverted among a given entity and other entities.12-11-2008
20080294638METHOD AND SYSTEM FOR PARSING CONTENTS OF MEMORY DEVICE - A system and a method for parsing and/or modifying content information of a memory device are provided. A definition file including at least one memory address and at least one associated parameter is provided, wherein each of the parameters corresponds to an event description. The memory address is loaded by a user and content information is fetched from the memory device according to the memory address. The content information is compared with the parameter to find a match. Accordingly, an event description corresponding to the match can be automatically acquired and displayed.11-27-2008
20080294636METHOD OF SEARCHING FOR SUPPLEMENTARY DATA RELATED TO CONTENT DATA AND APPARATUS THEREFOR - Provided are a method of receiving metadata including keywords related to content data and searching for supplementary data by using the keywords after the content data is reproduced, and an apparatus therefor. The method comprises receiving content data to be reproduced; receiving metadata including at least one keyword related to the content data; and searching for supplementary data related to the content data by using the at least one keyword included in the metadata. According to the present invention, metadata may be accessed during reproduction of content data or after the content data is completely viewed by using a keyword table, and supplementary data may be easily found in a television (TV) browser, by displaying the keywords as recommended keyword buttons.11-27-2008
20080288490System and Methods for Keyword Searches in Unallocated Spaces - There are many situations where it is desirable for law enforcement or security officials, and others, to perform a Keyword search on a confiscated memory device. Keyword searches on undeleted files are well known in the art. Often files are deleted in an effort to hinder prosecution. When a file is deleted, the information that connects one chunk of memory to another is often lost. The file is now considered to be in “unallocated space” as the operating system is not tracking the chunks that made up the file anymore. When the information that connects one chunk of memory to another is lost there is no way to perform a keyword search across multiple chunks of memory in the art. The current invention rectifies this situation by providing ways to perform keyword searches across unallocated chunks. Additional the current invention provides methods to reconstruct files that have been deleted.11-20-2008
20100030780IDENTIFYING RELATED OBJECTS IN A COMPUTER DATABASE - Provided are, among other things, systems, methods and techniques for identifying related objects in a computer database. In one representative implementation: (a) a feature vector that describes an existing object is obtained; (b) comparison scores are generated between the feature vector and various sample vectors; (c) a set that includes at least one designated vector is identified from among the sample vectors by evaluating the generated comparison scores; (d) a computer database is searched for matches between label(s) for the designated vector(s) and labels for representative vectors for other objects represented in the computer database; and (e) at least one related object is identified based on the identified match(es).02-04-2010
20080208858METHOD OF MANAGING WEBSITES REGISTERED IN SEARCH ENGINE AND A SYSTEM THEREOF - The present invention relates to a search engine that provides information on a predetermined website on the Internet. According to a preferred embodiment of the present invention, there is provided a method of managing websites registered in a search engine in a search engine administration system, comprising the steps of allowing a predetermined interface module to receive information on a website and allowing a website registration module to sort the received website information by the predetermined field and then to record the sorted information in a database means; extracting a HTML file constituting web pages of the website; detecting a predetermined function that generates a pop-up window by analyzing the extracted HTML file; increasing a predetermined counter value as much as a given value depending on the number of pop-up windows generated due to the detected function; determining whether the counter value exceeds a predetermined value; and if it is determined that the counter value exceeds the predetermined value, controlling a predetermined process to be performed for the registered website08-28-2008
20090112862Image-based search system and method - Disclosed-herein is an image-based search system and method. The image-based search system includes at least one user terminal, an information communication network, a search server, and a web server. The user terminal transmits any one or more of a search term entry signal, an image selection signal and an image combination signal to a search server and receives relevant search results from the search server. The information communication network connects the user terminal, the search server and a web server to one another. The search server receives any one or more of the search term entry signal, the image selection signal and the image combination signal from the user terminal, performs searching using attribute information of an image, and transmits search results, including images, to the user terminal. The web server forms a physical space over the information communication network, in which websites, which are objects from which information is gathered by the search server, exist.04-30-2009
20090089283METHOD AND APPARATUS FOR ASSIGNING A CULTURAL CLASSIFICATION TO A NAME USING COUNTRY-OF-ASSOCIATION INFORMATION - A method and system for performing a search request for a name among a database including a plurality of names. In one implementation, the method includes receiving the search request on the name, determining a geographic location associated with the name, assigning a cultural classification to the name based on the geographic location associated with the name, and completing the search request by searching for the name among the plurality of names within the database based on the cultural classification assigned to the name.04-02-2009
20080208857PROCESSING, BROWSING AND EXTRACTING INFORMATION FROM AN ELECTRONIC DOCUMENT - The present invention relates to methods, apparatus and systems for processing an electronic document and its corresponding device. It provides methods for browsing an electronic document and its corresponding browser, and methods for extracting information segments from an electronic document and its corresponding system for the same. An example of a method for processing an electronic document comprises extracting one or more information segments of the domains to which the electronic document relates from the electronic document being written by an author, and correspondingly storing said extracted information segments with said document. Wherein one or more information extraction patterns are used to extract information segments of different domains to which the electronic document relates from said document. And the extracted information segments are verified by the writer so as to ensure its correctness, reliability and readability.08-28-2008
20080270398Product affinity engine and method - The present invention relates to a computerized engine and method for using the same to determine and effect product sales to customers based on customer-driven affinities relating to the products. In some embodiments, the computerized engine includes hardware and software and uses customer or transaction data or product characteristics to calculate an output relating to the affinity. In some embodiments, the computerized engine provides an output indicative of the affinity. In some embodiments, the computerized engine is coupled to a database containing the customer or transaction data or product characteristics. The system and method can also determine a reduced affinity depending on a period of time elapsed between a pair of events being related, or as a function of the age of collected affinity data.10-30-2008
20090265348System and methods for detecting rollback - In an embodiment of a method of and system for detecting rollback of usage data, the usage data is recording in a database. A sequence value in the database is repeatedly advanced. A copy of the sequence value is repeatedly saved to protected storage. The copy of the sequence value in the protected storage is compared with the sequence value in the database, and it is determined whether the result of the comparison is consistent with normal operation of the database since the previous save to protected storage.10-22-2009
20090171957METHOD AND SYSTEM OF APPLYING POLICY ON SCREENED FILES - Described is a mechanism comprising a data screening filter and user mode service that applies (enforces) policies regarding allowing or blocking file content of a directory, based on matching the filename against patterns associated with that directory. An administrator configures a screening policy, such as the types of files to allow in a particular directory and the types of files to block. File groups of member patterns and non-member exclusion patterns are defined and selectively collected in directory screening objects (DSOs). A directory screening object (DSO) is associated with a directory. When an I/O create request specifying a filename and a target directory is received, the filename is evaluated against the member/non-member patterns in the file groups referenced by the DSO for that directory to make for an allow or block policy decision. If not matched, DSOs on parent directories are evaluated upwards seeking a policy decision.07-02-2009
20090319521NAME SEARCH USING A RANKING FUNCTION - An approach is described for performing a name search using a name search operation and a ranking operation. The name search operation may take text as input and apply a fuzzy matching operation and a lookup operation to generate a collection of candidate names with respective probability scores. In other cases, speech or handwriting recognition may generate the collection of candidate names and probability scores. The ranking operation may then rank these candidate names using a ranking function. The ranking function may rank the candidate names based on the probability scores associated with the names and at least one other factor. One such factor may reflect whether information provided by a user matches profile information associated with a candidate name under consideration. Another factor may reflect an extent of a nexus between the user and a person associated with the candidate name. Other types of factors can be used.12-24-2009
20090083267Method and System for Compressing Data - The present disclosure is directed to a method and system for compressing data. In accordance with a particular embodiment of the present disclosure, at least one data string is received. The at least one data string includes characters. A token string corresponding to the at least one data string is generated. At least one repeated substring in the at least one data string is identified. A refer-back token associated with the at least one repeated substring is generated. The refer-back token indicates a position of the at least one repeated substring and a length of the at least one repeated substring.03-26-2009
20090150393METHOD FOR ASSIGNMENT OF POINT LEVEL ADDRESS GEOCODES TO STREET NETWORKS - Assignment of point level address geocodes to street networks include the steps of entering address data and collecting candidate point data records and segment data record matches for the entered address. Each point data record includes address elements and also includes a geocode. Each segment data record includes a centerline between segment record data points. A determination is made if there is at least one point data record match for the input address. The best point record match for the input address is selected when at least one point data record match is made. The address elements from the point record are compared to any collected segment data records. A determination is made of the best segment record match to the selected best point record and the best segment record match is selected. The selected best segment record is employed and the bearing and distance from the selected best point record is calculated to a point on the centerline of the selected best segment record to determine a geocode for the centerline point.06-11-2009
20100042622SYSTEM AND METHOD FOR COMPILING A SET OF DOMAIN NAMES TO RECOVER - In one embodiment, the present invention relates to a method and system that receives a seed term, obtains a corpus of candidate terms, where each candidate term comprises the seed term or variants of the seed term, obtains network traffic information associated with each candidate term in the corpus of candidate terms, and compiles a set of domain names based on the network traffic information. Each domain name in the set comprises one of the corpus of candidate terms.02-18-2010
20100036842Fast identification of complex strings in a data stream - A method for detecting and locating occurrence in a data stream of any complex string belonging to a predefined complex dictionary is disclosed. A complex string may comprise an arbitrary number of interleaving coherent strings and ambiguous strings. The method comprises a first process for transforming the complex dictionary into a simple structure to enable continuously conducting computationally efficient search, and a second process for examining received data in real time using the simple structure. The method may be implemented as an article of manufacture comprising at least one processor-readable medium and instructions carried on the at least one medium. The instructions causes a processor to match examined data to an object complex string belonging to the complex dictionary, where the matching process is based on equality to constituent coherent strings, and congruence to ambiguous strings, of the object complex string.02-11-2010
20080263038METHOD AND SYSTEM FOR FINDING A FOCUS OF A DOCUMENT - Method and apparatus for finding the focus of a document. A semantic network includes a plurality of nodes, each representing a concept, and links, connecting the nodes, representing relations between the concepts is used. The method including: providing a list of terms in an input document which match concepts represented by the semantic network, and a frequency value for each matched term indicating the number of times the term appears in the document; mapping each matched term to a referent node or to a plurality of possible referent nodes in the semantic network, and assigning weights to nodes.10-23-2008
20090094238TECHNIQUES FOR IDENTIFYING A MATCHING SEARCH TERM IN AN IMAGE OF AN ELECTRONIC DOCUMENT - A technique for facilitating identification of a matching search term in one or more images includes selecting at least a portion of an image and creating search enriched metadata for a document that includes the image. The search enriched metadata includes a text portion that provides one or more search terms that are associated with the selected portion of the image and a location portion that provides a location of the selected portion of the image.04-09-2009
20090094236SELECTION OF ROWS AND VALUES FROM INDEXES WITH UPDATES - Methods and apparatus, including computer program products, for selection of rows and values from indexes with updates. In general, rows of an index may be associated with validity flags that indicate whether a row has been updated with an update inserted in a delta index; one scheme for value identifiers may be used for an index and another scheme for one or more delta indexes where all of the indexes are, to at least some extent, compressed according to dictionary-based compression; and multiple delta indexes may be used in alternation such that one delta index may accept updates while another is being updated. The delta indexes may also have validity flags and all updates, such as modifications of values, deletion of records, and inserting of new records may be handled as updates accepted by one or more delta indexes.04-09-2009
20090204614SEARCHABLE ELECTRONIC RECORDS OF UNDERGROUND FACILITY LOCATE MARKING OPERATIONS - Methods and apparatus for generating a searchable electronic record of a locate operation in which a presence or an absence of at least one underground facility within a dig area may be identified using one or more physical locate marks. Source data representing one or more input images of a geographic area comprising the dig area is received and processed so as to display at least a portion of the input image(s) on a display device. One or more digital representations of the physical locate mark(s) applied to the dig area during the locate operation are added to the displayed input image(s) as “locate mark indicators” so as to generate a marked-up image. Information relating to the marked-up image is electronically transmitted and/or electronically stored so as to generate the searchable electronic record of the locate operation.08-13-2009
20090204612APPARATUS AND METHOD FOR DYNAMIC WEB SERVICE DISCOVERY - An apparatus and method is provided to dynamically search for available Web services by persistently searching a distributed multi-level UDDI registry chain, interrogating their published technical specifications and enabling the consumer to find, bind, and invoke the desired Web service in real-time and without intervention by the consumer. The search criteria includes identifying candidate published services that fall within an acceptable margin of error based on information previously published within a consumer service profile. The measure of conformance between the registry semantic map and consumer service profile is parameterized and chosen by the consumer in advance. The service profile includes an XML schema which exposes consumer profile metadata and corresponding information sets used by a rules engine for pattern matching purposes.08-13-2009
20090089284METHOD AND APPARATUS FOR AUTOMATICALLY DIFFERENTIATING BETWEEN TYPES OF NAMES STORED IN A DATA COLLECTION - A method and system for differentiating types of data stored in a data collection. In one implementation, the method includes receiving a search request on a first type of data stored in the data collection; automatically differentiating data of the first type stored in the data collection from data of other types stored in the data collection; and completing the search request using data determined to be of the first type. Automatically differentiating data of the first type includes determining a type of each data entry in the data collection based only on tokens associated with the data entry.04-02-2009
20090276425ENCODING SEARCH RESULTS AS A SEARCH PERMANENT LINK UNIFORM RESOURCE LOCATOR - A method, system, and computer program product for encoding search results as a search permanent link Uniform Resource Locator (URL). The search result permanent link URL allows a user to distribute a search query that ensures that a set of search results will be the same (known as a fixed set of search result URLs) when the search query is re-run. If there is a broken search result URL link, the broken link is highlighted within a fixed set of search result URLs.11-05-2009
20090282038Probabilistic Association Based Method and System for Determining Topical Relatedness of Domain Names - Systems, computer software and methods for calculating relatedness scores which are indicative of relatedness of pairs of domain names requested by clients are described. The method includes receiving DNS traffic data, wherein the DNS traffic data includes at least domain names requested by clients and identities of the clients requesting the domain names, generating sequences of the domain names based on the received DNS traffic data, collecting co-occurrence counts for queried pairs of domain names, applying a probabilistic association estimate to the collected counts to determine the relatedness scores of the queried pairs of domain names, and storing the determined relatedness scores.11-12-2009
20080313182METHODS, DEVICES, AND COMPUTER PROGRAM PRODUCTS FOR PREDICTIVE TEXT ENTRY IN MOBILE TERMINALS USING MULTIPLE DATABASES - A method of providing a predictive text function in a mobile terminal includes detecting entry of one or more alphanumeric characters via a user interface in a currently-executing first mobile terminal application. At least one character string including the one or more characters is identified among words in a default database associated with the first mobile terminal application and among words in an additional database associated with a second mobile terminal application. The additional database includes at least one word that is not found in the default database. Accordingly, the at least one character string including the one or more characters is displayed via the user interface in the first mobile terminal application. Related devices and computer program products are also discussed.12-18-2008
20080288489Method for Searching Patent Document by Applying Degree of Similarity and System Thereof - A method for searching patent documents by applying degree of similarity and a system thereof are disclosed. The method for searching patent documents by applying degree of similarity comprises receiving at least one search keyword from a user of the service; searching a document previously stored in a database, by the search keyword; and evaluating a degree of similarity to the search keyword on the document that is searched by the search keyword, wherein the degree of similarity is evaluated by measuring at least one degree among a degree of appearance frequency of the search keyword in the document, a degree of proximity between the search keywords, and a degree of word order between the search keywords. Therefore, patent documents may be searched by arranging the documents according to a degree of similarity to a search keyword, and not according to whether the keyword is included in the patent documents.11-20-2008
20090089285METHOD OF DETECTING SPAM HOSTS BASED ON PROPAGATING PREDICTION LABELS - Systems and methods for identifying spam hosts are disclosed in which hosts are known to the system and initially classified as spam or non-spam by a baseline classifier. The accuracy of the initial host classifications are then improved by propagating them using a random walk algorithm. The random walk used may be modified in order to obtain a weighted or skewed characterization of the host. The hosts may then be reclassified based on the characterization obtained from the random walk to obtain a final spam/non-spam classification. The final classification may then be used in many different ways including to filter search results based on host classifications so that spam hosts are not displayed or displayed last in a results set.04-02-2009
20080208853Processing device for detecting a certain computer command - The invention provides a processing device for detecting a certain computer command in a string of characters representing a uniform resource identifier, the certain command comprising a predefined command header, the command header being followed by a command name from a plurality of predefined command names. The processing device comprises a determiner for determining whether the string of characters comprises the predefined command header, the determiner being further configured to determine whether a sub-string of characters following the command header comprises the command name if the string of characters comprises the predefined command header and a provider for providing the predefined command header and the command name if the command header comprises a command name as the certain computer command.08-28-2008
20080208851SYSTEM AND METHOD FOR MONITORING AND RECOGNIZING BROADCAST DATA - A system for monitoring and recognizing audio broadcasts is described. The system includes a plurality of geographically distributed monitoring stations, each of the monitoring stations receiving unknown audio data from a plurality of audio broadcasts. A recognition system receives the unknown audio data from the plurality of monitoring stations and compares the unknown audio data against a database of signature files. The database of signature files, or index sets, corresponds to a library of known audio files, such that the recognition system is able to identify known audio files in the unknown audio stream as a result of the comparison. The system further includes a nervous system able to monitor and configure the plurality of monitoring stations and the recognition system, and a heuristics and reporting system able to analyze the results of the comparison performed by the recognition system and use metadata associated with each of the known audio files to generate a report of the contents of plurality of audio broadcasts.08-28-2008
20080208850FAST IDENTIFICATION OF COMPLEX STRINGS IN A DATA STREAM - A method for detecting and locating occurrence in a data stream of any complex string belonging to a predefined complex dictionary is disclosed. A complex string may comprise an arbitrary number of interleaving coherent strings and ambiguous strings. The method comprises a first process for transforming the complex dictionary into a simple structure to enable continuously conducting computationally efficient search, and a second process for examining received data in real time using the simple structure. The method may be realized by an article of manufacture comprising at least one processor-readable medium and instructions carried on the at least one medium. The instructions causes a processor to match examined data to an object complex string belonging to the complex dictionary, where the matching process is based on equality to constituent coherent strings, and congruence to ambiguous strings, of the object complex string.08-28-2008
20080208856Classification-Based Method and Apparatus for String Selectivity Estimation - Histogram construction and selectivity estimation for string and substring match queries in databases of data having strings associated with attributes. The histogram construction counts string-attribute pairs in the documents, and outputs string-attribute-count triples sorted by count. The collection is partitions the collection into buckets. A synopsis is generated for the partition, having an average selectivity or count of the string-attribute-count triples in the partition and summary information representing the set of string-attribute pairs belonging to the bucket. Subsequent queries, both for exact and substring matches, use the synopsis to estimate the selectivity of buckets.08-28-2008
20080208854Method of Syntactic Pattern Recognition of Sequences - This invention relates to the Pattern Recognition (PR) of noisy/inexact strings and sequences and particularly to syntactic Pattern Recognition. The present invention presents a process by which a user can recognize an unknown sting X, which is an element of a finite, but possibly larger Dictionary, H, by processing the information contained in its noisy/inexact version, Y, where Y is assumed to contain substitution, insertion or deletion errors. The recognized string, which is the best estimate X+ of X, is defined as that element of H which minimizes the Generalized Levenshtein Distance D(X,Y) between X and Y, for all X08-28-2008
20090100055FAST SIGNATURE SCAN - Systems and methods for scanning signatures in a string field. In one implementation, the invention provides a method for signature scanning. The method includes processing one or more signatures into one or more formats that include one or more fingerprints and one or more follow-on search data structures for each fixed-size signature or signature substring such that the number of fingerprints for each fixed-size signature or signature substring is equal to a step size for a signature scanning operation and the particular fixed-size signature or signature substring is identifiable at any location within any string fields to be scanned, receiving a particular string field, identifying any signatures included in the particular string field including scanning for the fingerprints for each scan step size and searching for the follow-on search data structures at the locations where one or more fingerprints are found, and outputting any identified signatures.04-16-2009
20090276426Semantic Analytical Search and Database - A system and method for of identifying a semantic meaning of searchable elements are provided. In one implementation, a system includes an adaptive machine-learning module including a pattern recognition processor. The pattern recognition processor is configured to recognize searchable elements in source information and identify a semantic meaning of the searchable elements based on contingency measures of their relationships within the source information without requiring a predefined ontology of terms. In another implementation, a method includes recognizing searchable elements in source information; and identifying a semantic meaning of the searchable elements using a pattern recognition processor based on contingency measures of searchable element relationships within the source information without requiring a predefined ontology of terms. A database index that logically represents a hash map from integer keys to hash sets, wherein the database index is configured to use joint counters to determine set intersections of searchable elements for relational discovery is also provided.11-05-2009
20090276427Method of Extracting Sections of a Data Stream - A method of extracting sections of a data stream, the sections including a set of sequences. Each sequence is encoded separately and coupled together to define a section. The method involves determining a combination of at least two sequences of the set, comparing the combination of sequences with sequences in the data stream, and rejecting or accepting extraction of the section of the data stream based upon the result of the comparison. If the combination of sequences does not include a start and end marker for the section, a search for the start and end markers is carried out before the section is extracted.11-05-2009
20090282036METHOD AND APPARATUS FOR DUMP AND LOG ANONYMIZATION (DALA) - According to one embodiment of the invention, an original dump file is received from a client machine to be forwarded to a dump file recipient. The original dump file is parsed to identify certain content of the original dump file that matches certain data patterns/categories. The original dump file is anonymized by modifying the identified content according to a predetermined algorithm, such that the identified content of the original dump file is no longer exposed, generating an anonymized dump file. The anonymized dump file is then transmitted to the dump file recipient. Technical content and infrastructure of the original dump file is maintained within the anonymized dump file after the anonymization, such that a utility application designed to process the original dump file can still process the anonymized dump file without exposing the identified content of the original dump file to the dump file recipient. Other methods and apparatuses are also described.11-12-2009
20090282039 APPARATUS FOR SECURE COMPUTATION OF STRING COMPARATORS - We present an apparatus which can be used so that one party learns the value of a string distance metric applied to a pair of strings, each of which is held by a different party, in such a way that none of the parties can learn anything else significant about the strings. This apparatus can be applied to the problem of linking records from different databases, where privacy and confidentiality concerns prohibit the sharing of records. The apparatus can compute two different string similarity metrics, including the bigram based Dice coefficient and the Jaro-Winkler string comparator. The apparatus can implement a three party protocol for the secure computation of the bigram based Dice coefficient and a two party protocols for the Jaro-Winkler string comparator which are secure against collusion and cheating. The apparatus implements a three party Jaro-Winkler string comparator computation which is secure in the case of semi-honest participants11-12-2009
20090282037METHOD AND SYSTEM FOR PROVIDING CONVENIENT DICTIONARY SERVICES - A method for providing a dictionary service to a terminal, includes: providing a dictionary service window in or near a web browser for displaying a webpage through a screen of the terminal if a certain item for executing dictionary services in the terminal is clicked; (b) receiving a query inputted in the provided dictionary service window wherein the query includes a query for requesting meaning, a query for requesting pronunciation, or both; and (c) searching and providing a translation data corresponding to the query for requesting meaning or a pronunciation data corresponding to the query for requesting pronunciation. The method provides a translation data and/or pronunciation data of a word or expression which the user wants to find out while web surfing through the dictionary service window.11-12-2009
20090282035KEYWORD EXPRESSION LANGUAGE FOR ONLINE SEARCH AND ADVERTISING - Media and methods are provided for creating and operating a keyword expression language. Syntax is generated as an abbreviation to represent a list of keywords. The syntax is executed as part of the keyword expression language to provide keywords. The syntax includes tokens that substitute for groups of information. Advertisers generate syntax which is subsequently used by a third-party to match to search queries and ads. The third-party may also generate keywords to match to the search queries and ads. The keywords are used to trigger advertising over the Internet.11-12-2009
20090292703Methods, Systems, and Products for Developing Tailored Content - Methods, systems, and products are disclosed for developing tailored content. A selection of content is received. Content information is received that describes the selected content. Clickstream data is received that describes at least one subscriber's action while receiving the selected content. A category is assigned to the selected content information. The content information is merged with the clickstream data to create content-choice information, and the content-choice information is stored in a database. The database is queried to obtain the content-choice information matching a search query.11-26-2009
20090292701Method and a system for indexing and searching for video documents - The invention relates to a method of indexing a video document, comprising the following steps: 11-26-2009
20090089286DOMAIN-AWARE SNIPPETS FOR SEARCH RESULTS - Techniques are disclosed for providing a domain-aware snippet for a search result. With such techniques, a domain classification component is provided for identifying a template used to generate a plurality of web pages of a domain, associating the template and content of the web pages related to the template with a Uniform Resource Locator pattern of the plurality of web pages, and storing the associated template, the related content, and the Uniform Resource Locator pattern in a database. A snippet extraction component is also provided for extracting text from a section of a web page of the plurality of web pages for a snippet of a search result corresponding to a search query, wherein the extracted text is based on a ranking value of the section and the relevance of the extracted text to the search query.04-02-2009
20080275876STORAGE MEDIUM STORING SEARCH INFORMATION AND REPRODUCING APPARATUS AND METHOD - A storage medium storing search information and a reproducing apparatus for the storage medium and method of reproducing AV data corresponding to a searching result matching a user's search condition and providing additional functions by using the searching result. The storage medium includes image data; and meta information used to provide an additional function using the image data in a predetermined searched section at a time of searching the predetermined section of the image data and reproducing the image data in the searched section. Accordingly, it is possible to provide various enhanced searching functions using various search keywords. In addition, it is possible to provide various additional functions using search information.11-06-2008
20080228769Medical Entity Extraction From Patient Data - Members of a medical entity class are extracted from patient data. A semi-supervised approach uses one or more initial medical terms such as terms from an ontology, for a given category or medical canonical entity. A larger set of medical terms is extracted from the medical information. In one example, the extraction is performed using lexical surface form features, rather than syntactical parsing.09-18-2008
20080228767Attribute Method and System - A method, software, database and system for attribute partner identification and social network based attribute analysis are presented in which attribute profiles associated with individuals can be compared, attribute partners can be identified, groups of individuals can be formed, associations between individuals can be determined, and attribute based information can be analyzed and referenced. Attributes, both genetic and non-genetic, can be analyzed and utilized not only for the basis of comparisons between individuals, but also to form groups based on overlapping genetic and non-genetic attributes.09-18-2008
20080281818SEGMENTED STORAGE AND RETRIEVAL OF NUCLEOTIDE SEQUENCE INFORMATION - Processing of genomic data is facilitated by providing a storage device with a database having a segmented sequence table. The table has a plurality of data subsets of common nucleotide sequence size n, wherein≧2, and each data subset of common nucleotide sequence n is separately indexed within the table. A database manager associated with the database retrieves a selected nucleotide sequence locus from the table. The selected nucleotide sequence locus is sized differently from the common nucleotide sequence size n, and the retrieving includes identifying each data subset of the segmented sequence table containing at least a portion of the selected nucleotide sequence locus, and retrieving the identified data subsets. The database manager processes the retrieved, identified data subsets to remove genomic data mapped to the nucleotide positions outside the selected nucleotide sequence locus, and outputs the selected nucleotide sequence locus.11-13-2008
20080281819NON-RANDOM CONTROL DATA SET GENERATION FOR FACILITATING GENOMIC DATA PROCESSING - Processing of genomic data is facilitated by providing a control data set generation system wherein a control generator tool or process creates matched data sets for facilitating informatics analysis. These matched data sets may include genomic loci or genomic sequences, or both. The data is taken from a database of actual genomic data, including sequence and annotation data, as opposed to ad-hoc generation, sequence scrambling or the like. This produces biologically relevant and accurate results which allow for stronger controls. The controls are matched against a user-provided data set via a number of parameters.11-13-2008
20080281817Accounting for behavioral variability in web search - The concept of variability pertains to whether users exhibit consistent search interaction patterns, for example, in terms of interaction flow or information targeted. Methods are provided for analyzing variability, and then adapting search-related functionality (e.g., processes and/or interfaces) to account for variability characteristics, for example, to account for predictable search interaction behavior.11-13-2008
20080281821Concept Network - A concept network that can be generated in response to a user query. Various embodiments include analysis of structure information, for example, where such information is based at least in part on Universal Resource Locators (URLs) of Web sites or data storage locations. A concept network may be used with a search tool where the search tool searches a plurality of sites (e.g., Web sites, data storage locations, etc.). In such an example, each site location is arranged with a node. Certain ones of the nodes are connected by at least one link. The concept network selects a portion of certain ones of the nodes based on the link, wherein the at least one link is used for content purposes.11-13-2008
20090265350METHOD, SYSTEM AND KEY EXTRACTOR FOR CORRELATING ADVERTISEMENTS IN A VERTICAL SEARCH ENGINE - An advertisement correlation method in a vertical search engine, a vertical search advertisement system, and a key extractor are disclosed. The method includes: receiving a search key input by a user; obtaining the search result information according to the search key; selecting correlation keys from the search result information; searching for the advertisements correlated with the correlation keys; and providing the correlated advertisement information.10-22-2009
20090265349METHOD FOR IMPROVING LOCAL DESCRIPTORS IN PEER-TO-PEER FILE SHARING - A method for improving searches in a peer-to-peer (P2P) file sharing system that includes a plurality of server computers. A content file, identified by a descriptor including at least one metadata term and a mathematical identifier that uniquely identifies the content file in one of the server computers, is selected for searching. Other server computers are searched to find one or more matching content files; one that has a descriptor with a mathematical identifier matching the mathematical identifier of the first content file. The descriptors of the matching content files are returned to the searching server computer and used to expand the local descriptor.10-22-2009
20090265347TRAIL-BASED EXPLORATION OF A REPOSITORY OF DOCUMENTS - Techniques that support trail-based exploration by a user of a repository of documents are described herein. In one embodiment, trail definition data that specifies a trail is received. The trail includes an ordered series of waypoints including a trailhead, intermediate waypoints, and one or more trailends. In some embodiments, deadends may also be defined in the trial. A particular waypoint in the ordered series of waypoints is established as a current waypoint. Search terms can be received from a user to cause a search to be performed. It is then determined whether the search satisfies matching criteria associated with a waypoint that immediately follows the current waypoint in the ordered series of waypoints. If so, the user advances to the next waypoint. Otherwise, the user remains at the current waypoint. Finally, if a trailend is reached, then an action such as rewarding the user in some way may be performed.10-22-2009
20080288488Method and system for determining trend potentials - A method for determining a significant change of an usage of expressions provided in a network system, including the steps of determining a reference data set including at least an expression frequency of expressions provided in the network system at a predetermined first time; determining a result data set including at least an indication of an expression frequency change based on the reference data set, wherein the expression frequency change indicates the change of the expression frequency of expressions indicated by the reference data set at a predetermined second time; and extracting, from the result data set, one or more expressions according to one or more predetermined filters to determine the change of the usage of expression in the network system.11-20-2008
20080313183Apparatus and method for mapping feature catalogs - A computer-implemented method, and corresponding apparatus, is used for mapping feature catalogs. The feature catalogs include features, feature attributes, and feature attribute enumerations. The method includes the steps of accessing a first feature catalog, accessing a second feature catalog, identifying features, feature attributes, and feature enumerations in the first feature catalog, identifying potentially corresponding features, feature attributes, and feature attribute enumerations in the second feature catalog, comparing the features, feature attributes, and feature attribute enumerations of the first and the second feature catalogs to determine a degree of match, and saving data indicative of each of the matches in a database to facilitate future searches and reports.12-18-2008
20080313180IDENTIFICATION OF TOPICS FOR ONLINE DISCUSSIONS BASED ON LANGUAGE PATTERNS - A topic identification system identifies topics of online discussions by iteratively identifying topic words or keywords of the online discussions and identifying language patterns associated with those keywords. The topic identification system starts out with an initial set of keywords and identifies language patterns that each include a keyword. The topic identification system then uses the identified language patterns to identify additional keywords of the online discussion that match the patterns. The topic identification system then again identifies language patterns using the keywords including the newly identified keywords. The topic identification system may repeat the process of identifying language patterns and keywords until a termination criterion is satisfied.12-18-2008
20080313181DEFINING A WEB CRAWL SPACE - Provided are techniques for defining a web crawl space to be crawled. A seed list including one or more seed names is received from a user, wherein each seed name represents a website. In response to receiving the seed list, a web crawl space for the received seed list is generated by generating one or more allow rules.12-18-2008
20080281820Schema Matching for Data Migration - Embodiments include a system for matching an element of a source schema to an element of a target schema. The system includes a processing unit and a communication unit. The processing unit may be configured to: identify a sample data item of the element of the target schema; match a part of the sample data item to a part of a sample instance of the source schema; and match the element of the source schema to which the part of the sample instance of the source schema belongs to the element of the target schema. The communication unit may be configured to: provide the sample data item through an interface and receive the sample instance of the source schema.11-13-2008
20080228768Individual Identification by Attribute - A method, software, database and system for attribute partner identification and social network based attribute analysis are presented in which attribute profiles associated with individuals can be compared and potential partners identified. Connections can be formed within social networks based on analysis of genetic and non-genetic data. Degrees of attribute separation (genetic and non-genetic) can be utilized to analyze relationships and to identify individual who might benefit from being connected.09-18-2008
20080201328Data Processing System and Method - A system and method for grouping separate elements, having a common characteristic, to produce at least one of an output document corresponding to a presentation or for producing the presentation.08-21-2008
20080288487Typed Relationships between Items - Aspects of the subject matter described herein relate to creating, maintaining, and using relationships between items. In aspects, items such as files, folders, and other objects may be stored in a data store. A user may desire to form a relationship between two items that provides additional semantic information regarding the relationship. To do so, an instance of an item reference is created and populated with data that associates the item reference with a source item and optionally a target item. The item reference is part of a type hierarchy and inherits properties from ancestor types. These types are included in a payload of the item reference and may be exposed to programs that seek to obtain information about the relationship indicated by the item reference. An item reference may be added without changing other data about the referenced items.11-20-2008
20080306946SYSTEMS AND METHODS OF TASK CUES - A computing system for encouraging the performance of a task comprises association data, a proxy module, a display module, and a reward module. The association data associates tags with stimuli related to performing tasks. The proxy module is configured to receive encoded data, to identify tags in the encoded data that have associated stimuli in the association data, and to generate modified encoded data that includes data representative of at least one of the stimuli. The display module is configured to receive the modified encoded data, to display information based at least in part on the modified encoded data, and to provide at least one mechanism for a user to perform a task related to at least one of the stimuli. The reward module is configured to reward a user for performing tasks related to the stimuli.12-11-2008
20080235224Digital display of color and appearance and the use thereof - The present invention is directed to a method for digital displaying images of various colors and appearances of an article and the use thereof. The invention is particularly directed to a method for displaying one or more images to select one or more matching formulas to match color and appearance of an article. The invention is even further directed to a method for displaying one or more images to select one or more matching formulas to match color and appearance of a target coating of a vehicle.09-25-2008
20080235221PREVIEWS PROVIDING VIEWABLE REGIONS FOR PROTECTED ELECTRONIC DOCUMENTS - A computer system and media for generating previews for protected electronic documents are provided. The computer system provides servers that receive rules corresponding to the protected electronic documents from owners of the protected electronic documents. The rules specify quantity and quality of each interaction, by client devices, with each protected electronic document. Additionally, the servers receive queries having query terms from the client devices. In response, the servers generate previews for the protected electronic documents that match the query. The previews are generated and transmitted to the client devices based on the rules stored by the servers.09-25-2008
20080235226PROVIDING INTERACTION BETWEEN A FIRST CONTENT SET AND A SECOND CONTENT SET IN A COMPUTER SYSTEM - Interaction is provided between a first content set and a second content set, both of which are loaded into a data structure. When an event associated with loading of the second content set is detected, the second content set is parsed to identify at least one sub-set of the second content set. The identified sub-set is checked against a first data set associated with the first content set to determine whether the identified sub-set matches the first data set. If a match is found, an action associated with the least one identified sub-set is executed and the data structure is modified.09-25-2008
20080208852Editable user interests profile - A method for an online information system includes tracking user interactions with the online information system, storing profile information for the user based on the user interactions, and providing user access to modify the user's profile information. This system improves confidence in system for users who are reluctant to have their online activity tracked by the system operator. The user has access to all information that the system operator has for the user, and can edit or correct that information.08-28-2008
20080235225METHOD, SYSTEM AND COMPUTER PROGRAM FOR DISCOVERING INVENTORY INFORMATION WITH DYNAMIC SELECTION OF AVAILABLE PROVIDERS - A solution (09-25-2008
20080235223Online compliance document management system - A method for a performing a real time audit of compliance statistics is provided which includes storing a plurality of compliance items into a database, inputting a job application for an employee having a job designation and a work location, using the job designation, work location, and compliance items to determine a collection of required compliance items, collecting the required compliance items into a record and storing the record into a file, inputting a compliant item into the file, using the required compliance items and the compliant items to determine a compliance statistic for the employee, work location, compliance items, and compliant items, and outputting the compliance statistic.09-25-2008
20080235222SYSTEM AND METHOD FOR MEASURING SIMILARITY OF SEQUENCES WITH MULTIPLE ATTRIBUTES - A method (and structure) for quantifying an ordered sequence of data, includes receiving data of the ordered sequence and determining a skeleton of the ordered sequence. The skeleton includes a plurality of perceptually important points (PIPs) of the ordered sequence, as derived by determining one or more points of local maxima of the data over the ordered sequence.09-25-2008
20080270399METHOD AND SYSTEM FOR PARALLEL FLOW-AWARED PATTERN MATCHING - A system for parallel flow-awared pattern matching and a method thereof for performing distributed detection for incoming flows are provided. The system includes a pattern-set-partitioner for partitioning a pattern set for pattern matching into a number of pattern subsets in advance, a plurality of pattern matching engines, and a scheduler. The pattern matching engines each perform pattern matching for the incoming flows. The scheduler selects a number of pattern matching engines equal to the number of the partitioned pattern subsets from all the pattern matching engines and allocates pattern matching tasks, each performing flow matching against one pattern subset, to the selected pattern matching engines. With the system and method of the present invention, distributed detection can be performed by partitioning rules/pattern set to realize load-balancing parallel flow-awared pattern matching.10-30-2008
20080270397AUTOMATED ASSEMBLY OF A COMPLEX DOCUMENT BASED ON PRODUCTION CONSTRAINTS - A method for assembling a document generates a set of candidate content items for inclusion in the document. The content items may be stored in a computer-readable storage medium. An inclusion constraint is automatically applied to the set to determine whether the set satisfies the inclusion constraint. If the set does not satisfy the inclusion constraint, a conflict may be resolved by identifying one or more candidate content items in the set to be removed and removing the identified candidate content items from the set. A document may be created that includes the candidate content items that were not removed. The document may be published.10-30-2008
20090030902SCHEMATIZED DATA INTELLIGENT ASSISTANCE FOR DEVELOPMENT ENVIRONMENTS - Intelligent assistance functionality is provided in development environments and/or other editors for schematized data. Input of a trigger character sequence can initiate an intelligent assistance box having data corresponding to a related schema. Thus, the intelligent assistance data can be dynamic as schematized data can change; the data can be queried from the schema as requested to facilitate this end. In one embodiment, the data can be an extensible markup language (XML) schema having a plurality of elements. In this regard, syntax can be entered into a development environment to effectuate an intelligent assistance box comprising the elements of the schema; a root level element can be displayed for an initial trigger character sequence. After selecting the element, another trigger sequence can be input to facilitate querying the schema for next level elements, and so on.01-29-2009
20090157676USING USER SEARCH BEHAVIOR TO PLAN ONLINE ADVERTISING CAMPAIGNS - Targeting parameters are generated for a media buy plan for advertisements to be displayed in conjunction with presenting web pages, based on a history of search events. Key phrases are received relative to a subject of the advertisements to be displayed. The received key phrases are provided as proposed key phrases to determine, from search events indicative of historical data of uses of a search service, a first subportion of search events for queries of the search service with the proposed key phrases and a second subportion of search events for queries of the search service not with the proposed key phrases. Classification processing is applied to determine potential targeting parameters associated with the first subportion and with the second subportion to identify potential targeting parameters that, statistically, contribute to membership in the first sub-population and in the second sub-population, respectively. Statistics are associated with the potential targeting parameters, based on the historical data, indicative of factors usable to determine whether to use the potential targeting parameters as actual targeting parameters of the media buy plan.06-18-2009
20090182743Registration and Maintenance of Address Data for Each Service Point in a Territory - A computer system and method is disclosed for mining current and archived address data in order to identify a preferred address for each service point in a territory. The data mining system may start in response to the presentation of a candidate address for matching. The set of mined data may be prioritized by clustering like characteristics, building similarity matrices, and by constructing dendrograms with nodes joined according to common characteristics. A computer system and method for maintaining a central database of preferred addresses is also disclosed. Selected address data gathered in a queue may be scored by characteristic, grouped by consignee location, and staged for processing. The scored queue of data may be prioritized by clustering like characteristics, building similarity matrices, and by constructing dendrograms.07-16-2009
20090125514Sequence Matching Algorithm - Sequence alignment techniques are disclosed. In one embodiment, a sparse data structure is constructed that represents respective character positions of matching character sets in input sequences. This sparse data structure may take a variety of forms, including a “tree of trees.” Once constructed, each match is linked to at most one other match using a local application of a predetermined algorithm (e.g., a Smith-Waterman-type scoring algorithm). The links between matches are analyzed and a possible alignment or set of alignments is produced.05-14-2009
20090182744STRING PATTERN ANALYSIS - A method of analyzing a string-pattern includes defining a minimum length (Lmin07-16-2009
20090164465IMAGE SEARCH SYSTEM, IMAGE SEARCH APPARATUS, AND COMPUTER READABLE MEDIUM - An image search system includes: a storage that stores a plurality of images; a first identification unit that identifies a deteriorated status of a key image used for a search; and a search processing unit that searches the images stored in the storage for a target image corresponding to the key image while referring to a deteriorated status of the key image identified by the first identification unit, and comparing the key image with the images stored in the storage.06-25-2009
20090187570APPARATUS FOR CONTROLLING SUBSCRIPTIONS - An apparatus for controlling subscriptions comprising: a detector operable to detect to a subscription associated with a wildcard topic string; and an analyzer, responsive to the detection of the subscription associated with a wildcard topic string and a topic string of a topic node matching the wildcard topic string, for analyzing a first attribute of the topic node; and means for determining whether a subscriber associated with the subscription should receive a message associated with the topic string of the topic node.07-23-2009
20090187568FREE STRING MATCH ENCODING AND PREVIEW - A system for entering data in a format that is easy to use, enabling selection of encoding format and displaying the resulting search data string in format to enable generation of byte sets for supplying to free string match algorithm for application to network data.07-23-2009
20090138471METHOD AND APPARATUS FOR IDENTIFYING DATA CONTENT - A method for identifying data content comprises: establishing a character base which stores characters corresponding to various service applications and protocols; performing matching between contents of currently received data and the characters in the character base, and obtaining characters contained in the currently received data; identifying at least one of a service application and a protocol corresponding to the characters contained in the currently received data according to a mapping relation between characters and protocols as well as a mapping relation between characters and service applications. An apparatus for identifying data content is also disclosed. The technical scheme of the present invention can identify data content comprehensively and can be easily extended.05-28-2009
20090006394Systems and methods for validating an address - Systems, methods, and software determine whether a field of an input digital representation of information, such as the street name field in an address, is correct by quickly comparing the field to a list of valid choices for that field. The list of valid choices is generated based on information from the input digital representation, such as a character string. If an exact match is not found, a fuzzy match comparison determines the most closely matching valid choice. If a suitable fuzzy match is not found, then the input information is invalid. Otherwise, another field of the input information, such as the building number field of an address, is tested for validity. If the second field passes the validity check, then the fuzzy match (or exact match) for the field is valid. A fuzzy matching field may replace the input field, thereby correcting the input information.01-01-2009
20090006393APPARATUSES, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR MANAGING FILES BEING STORED IN A MEMORY - A method is provided for interacting with files being stored in a memory. The method includes sequentially receiving characters into a series that at least temporarily designates at least part of a file name for a target file being stored in a memory. As the characters are sequentially received, a respective list of files stored in the memory that have file names that are related to a current version of the series are iteratively provided. The list of files can include at least one file having a file name that fails to include the current version of the series. Also provided are corresponding apparatuses and computer program products.01-01-2009
20090006391AUTOMATIC CATEGORIZATION OF DOCUMENT THROUGH TAGGING - A system and method for identifying a keyword for tagging a document using a tagging algorithm. The keyword is matching with an existing tag. Irrelevant keywords are rejected based on a relevancy factor. The existing tag is updated based on a feedback.01-01-2009
20080243842Optimizing the performance of duplicate identification by content - In accordance with the disclosure, there is provided a method for identifying duplicate documents comprising drafting a first document and creating a near unique representative string based on the document content. The method further comprises searching for other documents with the same NRS and selectively assigning a duplicate group identification to the first document, the duplicate group identification is unique if no near unique representative string matches are found, or the duplicate group identification is the same as an associated duplicate document's duplicate group identification that matches the NRS. The method further comprises placing the DGI into a meta-data of the first document and recalling a list of duplicates of a particular document based upon user demand by searching the meta-data and selecting documents using the same DGI.10-02-2008
20080243841Pattern searching methods and apparatuses - A computer-based method for identifying patterns in computer text using structures defining types of patterns which are to be identified, wherein a structure comprises one or more definition items, the method comprising assigning a weighting to each structure and each definition item; searching the computer text for a pattern to be identified on the basis of a particular structure, a pattern being provisionally identified if it matches the definition given by said particular structure; in a provisionally identified pattern, determining those of the definition items making up said particular structure that have been identified in the provisionally identified pattern; combining the weightings of the determined definition items and optionally, the weighting of the particular structure, to a single quantity; assessing whether the single quantity fulfils a given condition; depending on the result of said assessment, rejecting or confirming the provisionally identified pattern.10-02-2008
20080243840COMPARING DATA SETS THROUGH IDENTIFICATION OF MATCHING BLOCKS - A computer readable storage medium stores instructions to receive a source data set and a target data set. Instructions to identify differences between the target data set and the source data set are also stored. These instructions include dividing the target data set into a set of target data blocks. Among the target data blocks at least one duplicate block in which an unbroken copy is fully duplicated within the source data set is identified. At least one modified block among the target data blocks in which an unbroken copy is not fully duplicated within the source data set is also identified. Differences between the modified block and the source data set are then determined.10-02-2008
20090187569SYSTEM AND METHOD FOR A WEB- BASED PEOPLE PICTURE DIRECTORY - An online system and a method for a web-based people picture directory provides collaborative identification of people pictured in the directory. The method includes creating profile templates for each person on earth and storing these profile templates in a central database. Next, populating the profile templates with publicly available basic information and publishing the public profile information in the web-based people picture directory. Next, retrieving a first person's own profile template, uploading the first person's personal pictures and identifying and tagging images of other persons depicted in the first person's personal pictures. Next, cross-correlating and matching the identified images of the other persons depicted in the first person's uploaded personal pictures with profile templates corresponding to the other persons and when there is a match uploading the first person's pictures depicting the identified persons' images into the corresponding profile templates of the depicted persons.07-23-2009
20090063480SYSTEM AND METHOD FOR SEARCHING A TERM IN AN ACTIVE COMPONENT OF A USER INTERFACE - Embodiments of the invention are generally directed to a system and method for searching a term in the active component of a user interface. A repository term generator generates a repository term associated with an active component of a user interface. A search engine searches for a term corresponding to the repository term. An output device provides the result corresponding to the term. In an embodiment, the result may contain one or more repository terms, the user interface provides for a selection and display of this selected repository term. In another embodiment, the selected repository term is navigated and expanded to display the term in the user interface.03-05-2009
20090327289METHODS AND SYSTEMS FOR MANAGING SIMILAR AND DISSIMILAR ENTITIES - Search criteria and potential targets of searches are each represented by a classification of attributes. The search classifications and target classifications are compared to determine whether a target matches or loosely matches the search criteria. The search classifications and target classifications may be modified to increase the chance of a match or loose match. A user can request to modify a classification using a visual interface in which information about the classification is presented. The matching approach may be implemented in conjunction with conventional matching methods to provide classifications. The matching approach is capable of interacting with users of the approach to dynamically alter the classifications being searched based on any given set of search results.12-31-2009
20090327288CONTENT ENUMERATION TECHNIQUES FOR PORTABLE DEVICES - Arrangements and techniques for enumerating portable device contents via a content management device are discussed herein. The portable device is caused to create and store a first data structure, referred to herein as a portable database, corresponding to the contents of a media library stored thereon. Upon connection to a content management device, the portable database is copied to the content management device, and is used in conjunction with information stored by the content management device in a second data structure, referred to herein as a device content table, to efficiently enumerate and provide other manipulation of the contents of the media library stored on the portable device.12-31-2009
20090049044Method for providing a searchable, comprehensive database of proposed rides - A method is disclosed that matches travelers for ride sharing according to personal preferences, such as smoking, music, allergies, drive sharing, expense sharing, number of riders, and gender, as well as basic trip details. In preferred embodiments the method is accessed via a website, and trips can be over any distance and/or by any land, air, or water vehicle. Embodiments require traveler verification by a payment and/or other means, and/or require travelers to supply identifying information. Matches can take into account ratings of travelers by other travelers. Confirmations, reminders, and ride sharing advice can be sent to riders before scheduled rides, and information about a shared ride can be sent to a non-rider. Fees can be charged, and credited if no match is accepted. Communications can be secure and requesting and/or accepting matches can be logged. Origin and/or destination radii can be automatically enlarged to provide more matches.02-19-2009
20090012958MULTIPLE STRING SEARCHING USING TERNARY CONTENT ADDRESSABLE MEMORY - A method and apparatus for multiple string searching using a ternary content addressable memory. For one embodiment, the method includes receiving a text string having a plurality of characters and performing an unanchored search of a database of a stored patterns matching one or more characters of the text string using a state machine, wherein the state machine comprises a ternary content addressable memory (CAM) and wherein the performing comprises comparing a state and one of the plurality of characters with contents of a state field and a character field, respectively, stored in the ternary CAM. In various embodiments, one or more of the following search features may be supported: exact string matching, inexact string matching, single character wildcard matching, multiple character wildcard matching, case insensitive matching, parallel matching and rollback.01-08-2009
20090024625METHOD FOR AUTOMATICALLY SEARCHING FOR INFORMATION AND VIDEO APPARATUS USING THE SAME - An automatic search method and a video apparatus using the method are provided. According to the automatic search method, the video apparatus deletes characters satisfying specific conditions from additional information which is received, and generates keywords. Therefore, a user may acquire search results without needing to directly input keywords, and it is possible to prevent undesired information from being displayed as a search result.01-22-2009
20090024623System and Method to Facilitate Mapping and Storage of Data Within One or More Data Taxonomies - A system and method to facilitate mapping and storage of data within one or more data taxonomies are described. Content information is received over a network. The content information is further analyzed to determine at least one theme representing subject matter related to the content information. Finally, the content information is stored within respective predetermined categories organized within at least one taxonomy, the predetermined categories being associated with the at least one theme.01-22-2009
20090024621METHOD TO SET UP ONLINE BOOK COLLECTIONS AND FACILITATE SOCIAL INTERACTIONS ON BOOKS - A computer-implemented method for recommending books is provided, which comprises the following: receiving data representing at least one image of a collection of books; automatically determining at least one piece of standard information about the collection of books by processing the image data; and recommending at least one book that is not included in the collection of books to a person associated with the collection of books based on the collection of books.01-22-2009
20090198691DEVICE AND METHOD FOR PROVIDING FAST PHRASE INPUT - A user interface for receiving text in a device adapted to receive key input, the device having access to a dictionary database and including a controller arranged to match the key input against the dictionary database to find matching input candidates, wherein the key input corresponds to at least one character and the matching input candidate corresponds to a candidate phrase including at least one word and wherein each of the at least one character matches a first character of the at least one word of the candidate phrase and wherein the order of the at least one characters of the key input is the same as the order of the at least one words having a first character matching the at least one characters in the candidate phrase.08-06-2009
20090198690INFORMATION COMPARISON METHOD - An information comparison method, suitable for a server, is provided. The server is connected to clients. The method includes following steps. First, assign at least two comparing objects among the server and the clients. Next, set a comparing information. Then, obtain the set comparing information from the comparing objects through the server. Afterwards, perform a comparing operation on the set comparing information according to formats of the obtained comparing information. And then, generate a comparing result.08-06-2009
20090198692Methods, Systems and Computer Program Products for Analogy Detection Among Entities Using Reciprocal Similarity Measures - Analogies among entities may be detected by obtaining associative counts among the entities and computing similarity measures among given entities and other entities, using the associative counts. First and second entities are then identified as being analogies if the first entity has a strongest similarity measure with respect to the second entity and the second entity also has a strongest similarity measure with respect to the first entity. The similarity measures may be calculated using a normalized entropy inverted among a given entity and other entities.08-06-2009
20090063481SYSTEMS AND METHODS FOR DEVELOPING FEATURES FOR A PRODUCT - An embodiment relates to a method for developing features for a product. The method includes monitoring a forum for messages and parsing a newly posted message in the forum. The method also includes determining whether the newly posted message contains a new feature comment and posting the new feature comment on a future feature list in response to the newly posted message contains a new feature comment.03-05-2009
20090063483System for Compiling Word Usage Frequencies - A system for assisting a user who is learning a language to prioritize words to be learned in order of usage frequency is disclosed. A frequency determination program running on a computer determines the frequency of usage of each word at a list of locations provided by the user. Different algorithms to identify what constitutes a word are employed depending upon the language of the source data. The total number of words at each location and their usage frequency found during the user session, along with a total number of words and their usage frequency for all user sessions performed regardless of location, are calculated and made available to the user. The user can view usage frequencies for words from a single location, a group of locations, or all user sessions performed.03-05-2009
20090063479SEARCH TEMPLATES - Searching a data store for parameter patterns specified in a query. A method includes receiving a query from a user including N parameter patterns. One or more alternatives are associated to one or more of the N parameter patterns. One or more templates are created. Each of the templates describes a number of microsearches. Each of the microsearches includes one or more of the N parameter patterns or the alternatives. Microsearches described by at least one of the one or more templates are enumerated. One or more sub-microsearches are performed by searching for parameter patterns and/or alternatives. Each sub-microsearch may have less than all terms needed for a full microsearch. Based on the results of the one or more sub-microsearches, one or more microsearches are eliminated from searching. The data store is searched using one or more of the remaining microsearches.03-05-2009
20090055394IDENTIFYING KEY TERMS RELATED TO SIMILAR PASSAGES - Key terms for similar passages from a large corpus are identified and used to enhance searching and browsing the corpus. The corpus contains multiple documents such as the text of books. Browsing by concept is supported by identifying a set of similar passages or quotations in documents stored in the corpus and assigning key terms to passages which links conceptually related passages together. The context of each passage instance is identified and can include, for example, the text surrounding the passage. The contexts of all similar passage instances are analyzed in order to identify key terms for the similar passage. The related key terms are analyzed to identify relationships among the key terms from different similar passage sets. The key terms can be used as a basis for navigating the documents in the corpus. The key terms enable browsing the documents in the corpus by concepts referenced in the documents.02-26-2009
20090055395Method and Apparatus for XML Data Processing - Method and apparatus for at least one of coding or decoding of data. The method comprising retrieving Extensible Markup Language (“XML”)-Unicode Transformation Format 8 (“UTF-8”) data, confirming XML-UTF-8 data in a proper format converting a prolog located within said XML-UTF-8 data, initializing a tag and attribute lookup table, comparing a current character to a plurality of multi-character patterns, determining whether said current character can be converted to a multi-character pattern in said plurality and Unicode, converting said current character to one of ASCII and Unicode when said current character cannot be converted to said multi-character pattern in said plurality, comparing at least one subsequent character to said plurality of multi-character patterns to determine conversion of at least the current character when said current character can be converted more than one way, determining whether there are more characters.02-26-2009
20090100051DIFFERENTIATED TREATMENT OF SPONSORED SEARCH RESULTS BASED ON SEARCH CONTEXT - Methods and apparatus are described for presenting sponsored search results. A user is enabled to initiate a search from a context. The sponsored search results and organic search results are presented in a search results page in response to the search, an order of the sponsored search results and placement of subsets of the sponsored search results relative to the organic search results in the search results page having been determined with reference to contextual information relating to the context.04-16-2009
20090083269METHOD AND SYSTEM FOR IDENTIFYING WEBSITE VISITORS - A website server computer hosting a website can identify a visitor to the website by using information provided by a visitor server computer that interacts with the visitor. The information provided by the server computer, in some embodiments, can be a combination of an IP address and characteristics of a computing device from where the visitor visits the website. In some embodiments, the IP address of the visitor server computer is used. In embodiments where the visitor may be sharing the computing device with other users, the characteristics may include at least one characteristic that is uniquely associated with the visitor. The website server computer can use a visitor identifier thus generated to start tracking the pages that the visitor requests during the session and can generate and customize pages for the visitor by using characteristics originated from the visitor.03-26-2009
20090083268MANAGING VARIANTS OF ARTIFACTS IN A SOFTWARE PROCESS - In some embodiments the management of revisions to segments of code or artifacts is disclosed. Such management can assist a software developer in the development of software. In some embodiments, a developer can retrieve a versioned file from a repository, modify content of the versioned file to create a variant of the versioned file, compare the variant to the versioned file, and determine a difference between the versioned file and the variant. Then, the one or more attribute can be assigned to the differences and the attributes can be indexed such that the variants can be located in response to a search. In some embodiments variants can be indexed based on a variability point to which they can be matched. Other embodiments are also disclosed.03-26-2009
20090077077Optimal selection of MSAG address for valid civic/postal address - A technique and apparatus to find an optimal MSAG-valid address in real time given an E9-1-1 caller's civic/postal address. Arrival at an accurate MSAG-valid address selection moves through as many as three well defined MSAG matching steps which may be performed sequentially, or in tandem. The three disclosed MSAG matching techniques include a Simple Match of an MSAG address, a historical data match for an MSAG address, and a pinned MSAG match performed using a commercial location engine.03-19-2009
20090100054Adaptive comparison control in a data store - Data store access circuitry is disclosed that comprises: a data store for storing values; comparator circuitry coupled to said data store and responsive to receipt of a data access request comprising an address to compare at least a portion of said address with at least a portion of one or more of said values stored in said data store so as to identify a stored value matching said address; a base value register coupled to said comparator circuitry and storing a base value corresponding to at least a portion of at least one of said stored values; and comparator control circuitry coupled to said comparator circuitry to control: (i) which portion of said address is processed as a non-shared portion and compared by said comparator circuitry with non-shared portions of said one or more stored values stored in said data store; and (ii) which portion of said address is processed as a shared portion and compared by said comparator circuitry with a shared portion of said base value stored in said base value register; wherein said shared portion of said base value has a value matching corresponding portions of all of said stored values stored within said data store.04-16-2009
20080263041Method and Apparatus for Automatic Detection and Identification of Unidentified Broadcast Audio or Video Signals - A system and method of detecting unidentified broadcast electronic media content using a self-similarity technique is presented. The process and system catalogues repeated instances of content that has not be positively identified, but are sufficiently similar as to infer repetitive broadcasts. These catalogued instances may be further processed on the basis of different broadcast channels, sources, geographic locations of broadcasts or format to further assist the identification thereof.10-23-2008
20080263040System and method for making a face call - A system and method for making a face call is provided. The method includes receiving (10-23-2008
20090100056Method And Device For Extracting Web Information - A method for extracting web information includes: selecting a number of Hypertext Markup Language, HTML, tags as tag ruler elements to generate a tag ruler from an HTML text of a web page according to sequence of the HTML text; matching the HTML text with the tag ruler elements in the tag ruler according to the sequence of the tag ruler elements in the tag ruler, segmenting web information according to matched HTML tags and saving web information segments and location information of HTML tags enclosing the web information segments in the HTML text; and determining location of HTML tags containing web information needed by a user in the HTML text, extracting web information segments corresponding to the web information needed by the user from the saved web information segments.04-16-2009
20090100052ENABLING COLLABORATIVE NETWORKS - A system and method for providing collaborative resources to a user. A search expression is received from a user. One more keywords are determined from the search expression. One or more resources are determined responsive to the keywords and based on information related to the user, and the at least one resource is provided to the user. The one or more resources may be determined responsive to and prioritized to at least one of, for example, information in email of the user, an organization of the user, a search history of the user, an organizational position of the user, a level of experience of the user, a geographical location of the user, a geographical location of the resource, a language preference of the user, or a keyword match confidence. The resource may include at least one person and presence information associated with the at least one person.04-16-2009
20080263039PATTERN-MATCHING SYSTEM - An XML parsing system includes a pattern-matching system 10-23-2008
20080263037METHOD AND APPARATUS FOR INDICATING CONTENT SEARCH RESULTS - The present description provides a method and an apparatus of indicating content search results. According to an embodiment of the present description, provided is a method of indicating content search results, comprising: inputting a search keyword; searching a match of the keyword among displayed contents and determining a location of the match on a display screen; and generating a directional hint indicating the location. In one feature, the present description permits a user to find a match result more conveniently and quickly and thereby improve the user experience. Other embodiments are described and claimed.10-23-2008
20080263036DOCUMENT SEARCH APPARATUS, DOCUMENT SEARCH METHOD, PROGRAM, AND STORAGE MEDIUM - An apparatus is configured to search for a document including a plurality of image components. The apparatus designates a key image to be used as a search key for an image search, sets a pattern of appearance in a document of the image component equivalent to the designated key image as a search condition, and searches for a document using the set search condition.10-23-2008
20080263035GROUPING BUSINESS PARTNERS IN E-BUSINESS TRANSACTION - A method, system and computer program product for conducting an electronic business are disclosed. An attribute of a business partner is used to identify the business partner. Business partners having the same attribute will be grouped as a business partner group. The business partner group will be used in conducting a business transaction.10-23-2008
20080263033INDEXING AND SEARCHING PRODUCT IDENTIFIERS - A method for indexing a product identifier and logical parts thereof according to one embodiment of the present invention includes receiving a product identifier; splitting the product identifier into logical parts; indexing the product identifier and the individual logical parts in association with a particular document or portion thereof in an index; and storing the index. A method for processing a search query according to another embodiment of the present invention includes receiving a search query containing one or more terms; searching a search index containing complete product identifiers and variations thereof for attempting to match the one or more terms to the product identifiers or the variations thereof; and if one or more of the terms matches a complete product identifier or variation thereof, selecting and outputting an indicator of a document, or portion thereof, associated with the matching product identifier.10-23-2008
20080263034METHOD AND APPARATUS FOR QUERYING BETWEEN SOFTWARE OBJECTS - A query fetches data from a source software object into a target software object using services and data types of the source software object. The target software object can then provide additional services which are not provided by the source software object.10-23-2008
20080263032UNSTRUCTURED AND SEMISTRUCTURED DOCUMENT PROCESSING AND SEARCHING - A method for analyzing and indexing an unstructured or semistructured document according to one embodiment includes receiving an unstructured or semistructured document; converting the document to one or more text streams; analyzing the one or more text streams for identifying textual contents of the document; analyzing the one or more text streams for identifying logical sections of the document; associating the textual contents with the logical sections; indexing the textual contents and their association with the logical sections; and saving a result of the indexing in a data storage device.10-23-2008
20090006392DATA PROFILE COMPUTATION - Architecture that provides a data profile computation technique which employs key profile computation and data pattern profile computation. Key profile computation in a data table includes both exact keys as well as approximate keys, and is based on key strengths. A key strength of 100% is an exact key, and any other percentage in an approximate key. The key strength is estimated based on the number of table rows that have duplicated attribute values. Only column sets that exceed a threshold value are returned. Pattern profiling identifies a small set of regular expression patterns which best describe the patterns within a given set of attribute values. Pattern profiling includes three phases: a first phases for determining token regular expressions, a second phase for determining candidate regular expressions, and a third phase for identifying the best regular expressions of the candidates that match the attribute values.01-01-2009
20090144277ELECTRONIC TABLE OF CONTENTS ENTRY CLASSIFICATION AND LABELING SCHEME - Computer-storage media, computerized methods and systems for classifying character strings within electronic documents are provided. Initially, textual data, which includes one or more character strings, is extracted from an electronic version of a document, typically scanned from a physical document utilizing optical character recognition. The textual data is received at a table-of-contents (TOC) engine that extracts semantic information from the textual data. Sub-engines within the TOC engine analyze the semantic information to determine at least one appropriate classification for character strings within the textual data. Labels selected from a predetermined set of TOC-architecture labels are appended to the character strings according to the appropriate classification. The character strings, and labels appended thereto, are stored in association with each other generating an electronic document file that includes enriched textual data.06-04-2009
20090222446THREE-DIMENSIONAL OBJECT IDENTIFICATION THROUGH RESONANCE FREQUENCIES ANALYSIS - The present invention discloses a system and a method for identifying input images of three-dimensional objects using at least one mathematical method, which enables calculating the resonance frequencies (also known as the natural frequencies) of a three-dimensional object.09-03-2009
20090106245METHOD AND APPARATUS FOR IDENTIFYING AND RESOLVING CONFLICTING DATA RECORDS - A method and apparatus for identifying and resolving conflicting data records are disclosed. The individual data fields of a master record are compared with the corresponding data fields of each source record in a particular data set. For each, one of various matching algorithms is used to assign a field matching score indicating the extent to which the data in the two data fields matches. The particular algorithm used to determine the extent of a match and to assign the corresponding score is dependent on the type of the data field. Once all of the data fields for a particular source record have been analyzed, the sum of the field matching scores is tallied to determine an overall record matching score for that particular source record.04-23-2009
20090106244DISCOVERING INTERESTINGNESS IN FACETED SEARCH - Exemplary embodiments of the present invention relate to enhanced faceted search support for OLAP queries over unstructured text as well as structured dimensions by the dynamic and automatic discovery of dimensions that are determined to be most “interesting” to a user based upon the data. Within the exemplary embodiments “interestingness” is defined as how surprising a summary along some dimensions is from a user's expectation. Further, multi-attribute facets are determined and a user is optionally permitted to specify the distribution of values that she expects, and/or the distance metric by which actual and expected distributions are to be compared.04-23-2009
20090106243SYSTEM FOR OBTAINING OF TRANSCRIPTS OF NON-TEXTUAL MEDIA - A system for obtaining textual transcripts of non-textual media is provided. The system uses a voting mechanism to determine which elements within a pool of candidate documents are included within a final transcript that is then associated with the non-textual media.04-23-2009
20090106242RESOLVING DATABASE ENTITY INFORMATION - Entity resolution in a database comprises receiving imported data comprising imported data entities each having properties each having values; receiving first user input that selects the imported data entities for resolution to existing data entities in a database; receiving second user input that specifies matching criteria for matching the imported data entities to the existing data entities, wherein each of the matching criteria comprises a matching technique; matching the imported data entities to the existing data entities using the matching criteria, resulting in creating and storing matched entity information, wherein the matched entity information is organized in matched entity data sets associated with subsets of the matching criteria that were matched; consolidating the imported data entities into the existing data entities; storing the first user input and second user input as a named criteria set for use in subsequent entity resolution operations.04-23-2009
20090198689SYSTEM AND METHOD FOR DATA PRESERVATION AND RETRIEVAL - A system and method for searching for computer environments, authenticating the computer environments, and copying data from the authenticated computer environments to a memory location. The data is marked or bound to the computer system it was copied from which provides a user with assurance that the data was obtained from a specific, authenticated source. The computer environments and the memory location may be coupled over a network.08-06-2009
20090248686SYSTEM AND METHOD FOR RETRIEVING INFORMATION FROM THE INTERNET BY MEANS OF AN INTELLIGENT SEARCH AGENT - A system and associated method for retrieving information from the Internet by an end user through use of an Intelligent Search Agent. Creating an index comprising a least one data structure corresponding to a respective forum. Next, submitting a query to at least one search engine located on a respective Internet server. After submitting the query, posting a question to at least one forum located on a respective Internet server. After posting the question, subscribing to at least one web syndication corresponding to at least one respective forum. After subscribing to at least one web syndication, receiving information from at least one web syndication. Finally, sending the information to said end user.10-01-2009
20090210418TRANSFORMATION-BASED FRAMEWORK FOR RECORD MATCHING - A transformation-based record matching technique. The technique provides a flexible way to account for synonyms and more general forms of string equivalences when performing record matching by taking as explicit input user-defined transformation rules (such as, for example, the fact that “Robert” and “Bob” that are synonymous). The input string and user-defined transformation rules are used to generate a larger set of strings which are used when performing record matching. Both the input string and data elements in a database can be transformed using the user-defined transformation rules in order to generate a larger set of potential record matches. These potential record matches can then be subjected to a threshold test in order to determine one or more best matches. Additionally, signature-based similarity functions are used to improve the computational efficiency of the technique.08-20-2009
20090222445AUTOMATIC SEARCH QUERY CORRECTION - A first search query is received from a user, the first search query having one or more characters, and a search result is determined based on the first search query. Based on the search result, the first search query is determined to have an incorrect input mapping. A first keyboard layout is identified for the first search query, and a second keyboard layout is identified. A corrected search query is generated from the first search query by mapping characters from the first keyboard layout to characters in the second keyboard layout. A corrected search result is determined based on the corrected search query, and the corrected search result is presented to the user.09-03-2009
20090222447DATA PROCESSING APPARATUS AND DATA PROCESSING METHOD - A matching device includes a matching unit that selects a pattern consistent with data to be compared, based on a rule by which a correspondence relation between patterns of data described in a markup language is defined; and a variable control unit that substitutes, when a pattern including a variable is selected, the data to be compared located at a position corresponding to a position of the variable in the selected pattern into the variable. The variable control unit substitutes, when the data to be compared is data described in a markup language, and when the data to be compared located at the position corresponding to the position of the variable in the selected pattern, is a fragment including a plurality of nodes, the fragment into the variable.09-03-2009
20090248685Method, System and Apparatus for Matching Job Applicants with Job Openings - A networked system for matching job applicants to employers and jobs is provided.10-01-2009
20090254553MATCHING MEDIA FOR MANAGING LICENSES TO CONTENT - Matching digital media available in a multi-node system. An example embodiment receives media from media providers. Metadata may also be included with digital media files or stored separately in a database. An example matching system generates, or receives a list of candidate nodes, such as network domains, to search for potential copies of digital media. The list may be defined and/or prioritized based on countries of interest, business sectors of interest, or other business rules. An example system crawls the domains to identify media files that appear on websites that are potential matches of the media files provided by the media providers. The system may download the media files, and evaluate them relative to the provided media files. The system identifies matches and identifies owners or operators of domains that had matching media files. The system generates case records for subsequent licensing or other action regarding the matched media files.10-08-2009
20090254552HIGHLY AVAILABLE LARGE SCALE NETWORK AND INTERNET SYSTEMS - Described is a technology by which a system corresponding to a large scale application is built from subsystems that are differentiated from one another based on characteristics of each subsystem. Example characteristics include availability, reliability, redundancy, statefulness and/or performance. Subsystems are matched to known design patterns, based on each subsystem's individual characteristics. Each subsystem's characteristics are associated with that subsystem for subsequent use in operation of the system, e.g., for managing/servicing the subsystem. The known design patterns may be provided in a library, in a programming framework, in conjunction with a development tool, and/or as data associated with one or more operating system services, server systems and/or hosted services that include at least one configuration, policy and or schema. Certain design patterns and/or characteristics patterns may be blocked to prevent their usage.10-08-2009
20090254551GUIDED ENTRY SYSTEM FOR INDIVIDUALS FOR ANNOTATING PROCESS DEVIATIONS10-08-2009
20090248687CROSS-DOMAIN MATCHING SYSTEM - A computer implemented method for analyzing a listing object to define a match to a candidate object among many possible candidate objects is disclosed. The method includes an operation to receive a listing object as an input. The method also includes an operation to generate a set of candidate objects based on characteristics of the listing object. The candidate objecting used to generate a listing-candidate pair defined by pairing the listing object with one of the candidate objects. The method may also include operations to process the listing-candidate pair such as an operation to normalize the listing object into a canonical form. Another operation can generate a matching feature vector for the listing-candidate pair. Where the matching feature vector includes a matching score based on a common feature between the candidate object and the canonical form of the listing object. In another operation, the method analyzes the matching feature vector with a judging committee module to render a match judgment. The match judgment based on evaluating the results of the judging committee module to determine whether the listing object and the candidate object are a match. The method also includes an operation that saves the match judgment to a computer readable media.10-01-2009
20090259658APPARATUS AND METHOD FOR STORING AND RETRIEVING FILES - An apparatus and a method for storing and retrieving files, the apparatus including a menu generation unit to generate a retrieval menu screen to input a retrieval condition, a token generation unit to generate a token by hashing at least one retrieval condition which is input through the retrieval menu screen, and a file retrieval unit to retrieve files matching the retrieval condition by comparing the generated token with each file information included in at least one file to be retrieved.10-15-2009
20090125517METHOD AND SYSTEM FOR KEYWORD CORRELATION IN A MOBILE ENVIRONMENT - Methods and systems for determining a suitability for a mobile client to display information are disclosed. A particular exemplary method includes receiving a plurality of sets of one or more first keywords on a mobile client, each set of first keywords associated with one or more respective first messages, monitoring user interaction of the respective first messages on the mobile client, determining a user selection rate for each unique first keyword of the plurality of sets of first keywords, receiving a set of target keywords associated with a target message, performing one or more matching operations between the set of target keywords and corresponding user selection rates to produce a set of one or more matching parameters, and displaying the target message on the mobile client dependent upon the matching parameters.05-14-2009
20090125516SYSTEM AND METHOD FOR DETECTING DUPLICATE CONTENT ITEMS - Generally, the present invention provides systems, methods and computer program products for detecting different content items with similar content by examining the anchortext of the link. A method of the present invention comprises selecting one of a plurality of websites, crawling the selected website to identify one or more content items, and downloading one or more content items of the selected website. A determination is then made as to the one or more linking relationships from the one or more content items of the selected website and one or more linking rules are learned based upon association rule mining of the one or more content items. The one or more linking rules are then applied to one or more content items of one or more websites in order to determine storage of the one or more content items based upon the one or more linking rules on a search provider's central server.05-14-2009
20090125515STRING POOLING - A start index and a length are obtained for a subset of a text sequence buffered within a parser. A string pool containing a plurality of pooled string objects is polled to determine whether any of the pooled string objects contain the subset of the text sequence buffered within the parser by using the start index and the length. One of the pooled string objects is used if it contains the subset of the text sequence, otherwise, the generation of a new pooled string object in the string pool containing the subset of the text sequence is initiated. Related techniques, apparatus, systems, and articles are described.05-14-2009
20080306945EXAMPLE-DRIVEN DESIGN OF EFFICIENT RECORD MATCHING QUERIES - Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.12-11-2008
20080306943PHRASE-BASED DETECTION OF DUPLICATE DOCUMENTS IN AN INFORMATION RETRIEVAL SYSTEM - An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.12-11-2008
20090077075MANAGEMENT OF LOGICAL STATEMENTS IN A DISTRIBUTED DATABASE ENVIRONMENT - A method for managing a logical statement within a distributed database includes checking, responsive to receipt of a first logical statement for by first database management system node, whether the first logical statement is stored within a segment of the distributed database; and storing, responsive to a determination that the first logical statement is not stored within a segment of the distributed database, storing the first logical statement in at least one of non-unique fact table and unique fact table.03-19-2009
20080215581CONTENT/METADATA SELECTION AND PROPAGATION SERVICE TO PROPAGATE CONTENT/METADATA TO CLIENT DEVICES - In various embodiments, one or more servers is (collectively) endowed with a core data collection and management service, and a core content/metadata selection and propagation service, to receive from client devices automatically collected user activity associated data and in response, to select and propagate content/metadata to the client device, in a more efficient, flexible and effective manner (with high relevancy).09-04-2008
20090070328METHOD AND SYSTEM FOR AUTOMATICALLY GENERATING REGULAR EXPRESSIONS FOR RELAXED MATCHING OF TEXT PATTERNS - A method and system for automatically generating regular expressions for relaxed matching of text patterns. A received input phrase expressed in a natural language is determined to be a plain text pattern. The plain text pattern is automatically tokenized, thereby generating a first token list. Rules loaded from a predefined rule set are automatically applied to the first token list to automatically generate a modified token list. The order of the rules being applied to the first token list is specified by the rule set. The modified token list is automatically converted into a regular expression that matches the plain text pattern and one or more variations of the plain text pattern. A utilization of the regular expression for an information extraction facilitates a recall and a precision of the information extraction.03-12-2009
20090077073Index term extraction device for document-to-be-surveyed - A device comprises input means (03-19-2009
20090077076DATABASE AND DATABASE PROCESSING METHODS - The present application relates to databases and methods for storing and searching data in database tree structures. In particular, but not exclusively, the present application relates to the processing of data stored in database tree structures for use in data packet routing applications. A method of searching a database using a search key is disclosed in which the database contains data stored in a tree structure. The tree structure includes a plurality of nodes. Data relating to a first node and a second node is stored in the database. The data includes a first node key and a second node key which is prefixed by the first node key. The tree structure is searched using a search key by traversing the second node and determining if said first node key has a prefix which matches said search key.03-19-2009
20090077074APPARATUS, COMPUTER PROGRAM PRODUCT, AND METHOD FOR SUPPORTING CONSTRUCTION OF ONTOLOGIES - To construct an ontology for a target data by re-using an existing ontology, from an aspect of the structure of the class hierarchy according to an object-oriented method and an aspect of the levels of relevance with other properties, the properties that correspond to the data items in the data serving as an ontology construction target and the extraction classes of the properties are determined as property extraction destination candidates for the ontology to be constructed. As a result, it is possible to re-use even a fine difference in the meanings among the properties in the classes. Consequently, it is possible to provide a support for constructing an effective ontology, while reducing the load on the user.03-19-2009
20090254554MUSIC SEARCHING SYSTEM AND METHOD - A music searching system and method conducting a metadata search of music based on an entered search term. Music identified from the metadata search is used as seed music to identify other acoustically complementing music. Acoustic analysis data of the seed music is compared against acoustic analysis data of potential candidates for determining whether they are acoustically complementing music. The acoustically complementing music is then displayed to the user for listening, downloading, or purchase.10-08-2009
20090210419METHOD AND SYSTEM USING MACHINE LEARNING TO AUTOMATICALLY DISCOVER HOME PAGES ON THE INTERNET - A method for automatically determining an Internet home page corresponding to a named entity identified by a specified descriptor including building a trained machine-learning model, generating candidate matches from the specified descriptor, wherein each candidate match includes an Internet address, extracting content-based features from websites associated with the Internet addresses of the candidate matches, determining a model score for each candidate match based on the content-based features using the trained machine-learning model, and determining a match from among the candidate matches according to the scores, wherein the match is returned as the Internet home page corresponding to the named entity.08-20-2009
20090204613PATTERN DETECTION APPARATUS, PATTERN DETECTION SYSTEM, PATTERN DETECTION PROGRAM AND PATTERN DETECTION METHOD - A pattern detection apparatus includes a pattern DB which stores pattern information corresponding to a file type, a management unit which receives data belonging to a file which is transferred between an information processing apparatus and an external apparatus connected thereto and is divided into the data, and an arithmetic unit which checks whether or not the data include a pattern indicated by the pattern information corresponding to the file type of the file and which reports a check result to be sent to the information processing apparatus.08-13-2009
20090144278METHOD AND SYSTEM FOR IMPROVING SOFTWARE QUALITY, USABILITY AND SUPPORT THROUGH AUTOMATED USAGE PATTERN DETECTION - A method, system and computer-readable medium for automatically determining common usage patterns of a data processing system and dynamically displaying the information back to a user are disclosed. The method includes automatically recording one or more usage patterns of a data processing system; storing the usage patterns in a repository; identifying one or more common usage patterns among the stored usage patterns; detecting a current usage pattern of a user; comparing the current usage pattern to the common usage patterns; selecting one or more of the common usage patterns that are similar to the current usage pattern; and displaying to the user an indication of one or more common usage patterns similar to the current usage pattern.06-04-2009
20090319522PROVIDING RELEVANT SPONSORED LINKS BASED ON USER INTERFACE DATA - A method for providing sponsored advertising based on interaction of a user with a user interface, such as a web browser, can be provided. The method can include providing a plurality of search results based on parameters received from the user via an online search engine or an e-commerce web site. A search result may include a link to a web page. The method further includes detecting at least one search result displayed on a viewable area of the user interface. The method further includes matching the at least one search result with at least one sponsored ad, such as by matching the content of the at least one search result with the content of the at least one sponsored ad. The method further includes providing the at least one sponsored ad for display in the viewable area of the user interface.12-24-2009
20090319523Best Match Search - An apparatus system and methods are presented for best match search. In one embodiment, the apparatus includes a plurality of functional modules configured to collect user profile information and a service provider criterion, match a service provider profile to at least one of the user profile information and the service provider criterion, calculate a service provider statistic based on service provider data associated with a selected service provider, and generate service provider comparison data in response to the service provider statistic. In the described embodiments, these modules include a profile collection module, a profile match module, a provider analyzer, and a provider comparison module.12-24-2009
20090112863METHOD AND APPARATUS FOR FINDING MAXIMAL FREQUENT ITMESETS OVER DATA STREAMS - The present invention relates to a method and apparatus for finding maximal frequent itemsets over data streams configured of continuously generated transactions. The method for finding maximal frequent itemsets over data streams including continuously generated transactions, when the itemsets included in previously generated transactions and a frequency of the itemsets are managed by a prefix tree and each of nodes of the prefix tree has information, such as the appearance frequency of the itemsets corresponding to the nodes in question, a maximum lifetime, which is a maximum point in time that may allow the itemsets to remain in a frequent state even when no itemset appears later, a mark indicating whether the itemsets are the maximal frequent itemsets, or the like, receiving transaction T04-30-2009
20090112864Methods for Identifying Relevant Metadata for Multimedia Data of a Large-Scale Matching System - A method for associate metadata to a multimedia content based on finding matches to similar multimedia content. A given input multimedia content is matched to at least another multimedia content with corresponding metadata. Upon determination of a match, the corresponding metadata is used as metadata of the given multimedia content. When a large number of multimedia data is compared a ranked list of metadata is provided. The most appropriate metadata is associated to the given multimedia content based on various criteria. The method can be implemented in any applications which involve large-scale content-based clustering, recognition and classification of multimedia data, such as, content-tracking, video filtering, multimedia taxonomy generation, video fingerprinting, speech-to-text, audio classification, object recognition, video search and any other application requiring content-based signatures generation and matching for large content volumes such as, web and other large-scale databases.04-30-2009
20090112858EFFICIENT METHOD OF USING XML VALUE INDEXES WITHOUT EXACT PATH INFORMATION TO FILTER XML DOCUMENTS FOR MORE SPECIFIC XPATH QUERIES - A system and method is provided for query processing comprises: creating an index of a database and ordering a set of index candidates from the index into a list based on a set of heuristic rules. A query defining a query path is then reduced into a list of single path expressions. Each index candidate is matched against the list of single path expressions according to the ordering of the index candidates. The matched candidate nodes are also verified to insure that they satisfy the query path.04-30-2009
20090112860Autonomic Resolution of System Configuration - A method and apparatus are provided to support autonomic computing for system configuration. Common base events (CBEs) are generated and, based upon system configuration, are employed to monitor system resources and to resolve system configuration conflicts prior to an error. A symptom database stores a set of rules for the configuration information. The configurations CBEs for the system configuration are compared with the symptom rules, and any discrepancies between the two elements are communicated to a user prior to an occurrence of an error in the system. Accordingly, an autonomic computer system is provided to support system configuration data.04-30-2009
20090112859CITATION-BASED INFORMATION RETRIEVAL SYSTEM AND METHOD - Disclosed are a method, machine-readable code, and a system for matching one or more citation tags with citation-rich documents or with professionals who are associated with a group of citation tags. The method takes a user input that can be converted to one or more primary search tags, and accesses a matrix of pair-wise tag co-occurrence values that are related to the co-occurrence of each pair of tags extracted from documents contained in a collection of citation-rich documents, to identify, for each primary tag received those secondary tags whose pair-wise co-occurrence values with respect to the primary tag is above a selected threshold value. A tag search vector constructed from the secondary tags and optionally, the primary vectors, is used in a database search to identify those documents or professionals having the highest tag-matching score with respect to the tag search vector, and these results are then displayed to the user.04-30-2009
20080275875NAVIGATOR DATABASE SEARCH METHODS - Methods and associated apparatus allow a vehicle navigator to more efficiently search for locations in a database. According to one such method, a map is divided into tiles, and locations are associated within each tile with the tile the user is in. When queried by a user for a location, the system checks the tile currently occupied by the user to determine if it contains any of the desired locations. The system then checks the tile(s) adjacent to the currently occupied tile to determine if it contains any of the desired locations. The system then checks the tile(s) adjacent to b) to determine if it contains any of the desired locations, and the process is repeated, as necessary, until all tiles adjacent to tiles that have been checked are themselves checked that are to determine if locations matching the query are present. The position of the location(s) can then be communicated to the user. Alternative methods and system-level aspects of the invention are also disclosed.11-06-2008
20080270400DOCUMENT ANALYSIS AND RETRIEVAL - A computer system configured to implement a method for document analysis and retrieval. A document that includes text is received from a host. Document keys (i.e., keywords and keyphrases) associated with the text are generated. In first embodiments, a provided document taxonomy has categories and associated category keys (i.e., keywords and keyphrases). The category keys of each category are compared with the document keys to determine a distance between the document and each category as a measure of how close the document is to each category. A subset of the categories is returned to the host, wherein the subset of the categories reflects the determined distances. In second embodiments, a search string is created as a logical function of a subset of the document keys. The search string is submitted to a search engine. Links to related documents are received from the search engine and returned to the host.10-30-2008
20080250017SYSTEM AND METHOD FOR AIDING FILE SEARCHING AND FILE SERVING BY INDEXING HISTORICAL FILENAMES AND LOCATIONS - A system, method, and computer-implementable method for aiding file searching within a file service by indexing historical filenames and locations. In response to receiving a request to alter a first name corresponding to a file within the file system to a second name, a file system manager associates the second name to the first name and to the file within the file system data structure. When receiving a request for a file, wherein the request includes the first name, the file system manager searches the file system for the file based on the first name. When determining the search based on the first name is not successful, the file system manager searches the file system data structure for the file based on the second name. When the second name is located within the file system data structure, the file system manager returns the file to fulfill the request.10-09-2008
20090012957SYSTEM AND METHOD FOR SEARCHING STRINGS OF RECORDS - System and method for detecting the inclusion of strings of words (records) in an input string of words. In a preparation phase, the records are pre-processed. Each record is represented by a string of chunks, each chunk composed of a pre-defined number of words. Each chunk found in at least one record is assigned a number of attributes, such as a “Begin of Record” attribute and an “End of Record” attribute. In the searching phase the input string is also divided in chunks, and for each input chunk, an Incremental Hash Function (IHF) is calculated for comparing with a prerecorded value ΔI. If the two values IHF and ΔI coincide for matching chunks with certain predefined attributes, a “probable match” is set, indicating a very high probability that a chunk was found in the records.01-08-2009
20090049043METHOD AND APPARATUS FOR PROVIDING TRAFFIC-BASED CONTENT ACQUISITION AND INDEXING - A method and apparatus for processing packets in a network are disclosed. For example, the method scans one or more packets representing a content that is being transferred via the network, where the scanning acquires one or more content elements. The method then builds a keyterm index from the one or more content elements, and stores the keyterm index in a repository. A query handler then responds to queries in accordance with the keyterm index.02-19-2009
20100057736TECHNIQUES FOR PERFORMING REGULAR EXPRESSION-BASED PATTERN MATCHING IN DATA STREAMS - Techniques for detecting patterns in one or more data streams. A pattern to be detected may be specified using a regular expression. Events received in a data stream are processed during runtime to detect occurrences of the specified pattern in the data stream.03-04-2010
20090070329METHOD, APPARATUS AND SYSTEM FOR MULTIMEDIA MODEL RETRIEVAL - A method, apparatus and system for multimedia model retrieval are provided. The method includes: obtaining parameters of a multimedia model to be retrieved; performing a projection on the multimedia model according to the parameters of the multimedia model so as to obtain a projection image; performing a feature extraction on the projection image; matching a feature extraction result with stored model multimedia file information to obtain a retrieval result; training a support vector machine (SVM) with the multimedia model labeled by a user upon the retrieval result as a training sample set, performing a probability-based classification on the multimedia model by the SVM, and updating the retrieval result with a classification result. The system of the present invention illustrated by embodiments achieves favorable applicability and robustness, so that users may perform a rapid and precise retrieval on massive model data in these fields.03-12-2009
20100017407THREE-DIMENSIONAL OBJECT RECOGNITION SYSTEM AND INVENTORY SYSTEM USING THE SAME - A system recognizes an arrangement of an object and counts the number of objects, comprising: a sensor for measuring a distance between the sensor and an object; a moving mechanism for moving the sensor; and a computer that is connected with the sensor and includes an object information database, an object arrangement database; a sensor data integrating section; and the sensor an object comparison calculator adapted to create an object model based on the data.01-21-2010
20090313248METHOD AND APPARATUS FOR BLOCK SIZE OPTIMIZATION IN DE-DUPLICATION - The invention provides a method and apparatus for determining sizing of chunk portions in data de-duplication. The method chunks input data into segments where each segment has a first size, assigns an identifier to each of the data segments, assigns an index to each of the identifiers, creates a suffix structure and a longest common prefix structure from the indexes, detects repeated sequences of indexes and non-repeated indexes from the suffix structure and the longest common prefix structure, determines a second size based on said detected repeated sequences and non-repeated indexes, and chunks the input data into a second plurality of data segments each having the second size.12-17-2009
20090313249CREATIVE WORK REGISTRY INDEPENDENT SERVER - An independent server (creative work protection server) that provides protection to the creative works of authors/artists by identifying web content in major third party host servers and other creative work content in database of the creative work protection server that contain similarities and reporting back to the authors/artists and major third party host servers. The creative works of the authors/artists may contain one or more of textual content, images, audio and video content. The creative work protection server has components that identify similarities to the works of the authors/artists (the creative works) and provide protection by assisting major third party host servers to delete the content, upon detection of similarities. The service to the authors/artists of the creative works is provided upon service charge basis. The creative work protection server provides provisions for registration, logging in, billing and to receive periodic results via email or webpage interface.12-17-2009
20100057733METHOD, COMPUTER PROGRAM PRODUCT, AND APPARATUS FOR ENABLING ACCESS TO ENTERPRISE INFORMATION - A method for enabling access to enterprise information may include analyzing text including a plurality of text strings and identifying a defined pattern within the text strings as corresponding to a particular entity. The particular entity may be associated with different classes of information stored in at least two respective different storage environments. The method may further include enabling provision of a selectable option providing access to one of the different classes of information from a corresponding one of the at least two respective different storage environments in response to selection of the selectable option.03-04-2010
20100057732SYSTEM AND METHOD FOR IDENTIFYING SOCIAL NETWORK INTERSECTION IN INSTANT MESSAGING - A method and computer program product for identifying one or more instant messaging contacts associated with a first instant messaging contacts list, and identifying one or more instant messaging contacts associated with a second instant messaging contacts list. The one or more instant messaging contacts associated with the first instant messaging contacts list are compared to the one or more instant messaging contacts associated with the second instant messaging contacts list. One or more comparison instant messaging contacts are determined.03-04-2010
20090216765Systems and Methods of Adaptively Screening Matching Chunks Within Documents - A computer identifies within a document multiple matching chunks in response to a search request from a user. The search request includes one or more search keywords and each of the multiple matching chunks matches at least one of the search keywords. The computer partitions the matching chunks into multiple groups. The matching chunks within a respective group have an associated matching level to the search request. The computer returns one or more groups of the matching chunks to the user in an order consistent with their respective matching levels to the search request.08-27-2009
20090216764Systems and Methods of Pipelining Multiple Document Node Streams Through a Query Processor - A computer identifies a first candidate document at a first data source and a second candidate document at a second data source in response to a request from a user, wherein the request includes one or more keywords. The computer generates a first node stream for the first candidate document and a second node stream for the second candidate document using data packets received from the respective first and second data sources. The computer alternatively processes the first node stream and the second node stream until a candidate chunk matching at least one of the keywords is identified therein, wherein the matching chunk includes a set of nodes within a respective data source.08-27-2009
20090216763Systems and Methods of Refining Chunks Identified Within Multiple Documents - After receiving a first user request including a first set of search keywords, a computer identifies a first set of chunks within multiple documents, wherein each chunk includes terms matching the first set of search keywords, and displays at least a portion of the first set of chunks, including highlighting the terms matching the first set of search keywords in the displayed portion in a first manner. After receiving a second user request to search among the documents for documents that satisfy a second set of search keywords, the computer identifies a second set of chunks within the documents, wherein each chunk includes terms matching the second set of search keywords, and displays at least a portion of the second set of chunks, including highlighting the terms matching the second set of search keywords in the displayed portion in a second manner that is different from the first manner.08-27-2009
20100057737DETECTION OF NON-OCCURRENCES OF EVENTS USING PATTERN MATCHING - Techniques for detecting non-occurrence of an event within a time period following the occurrence of another event. In one embodiment, language extensions are provided to a language that enable queries to be formulated for detecting non-occurrences using that language.03-04-2010
20090024624Determining top combinations of items to present to a user - Embodiments of the present invention pertain to determining top combinations of items to present to a user. According to one embodiment, data that includes information describing a plurality of combinations of records is accessed. Each record describes a plurality of items. The data is analyzed using a branch and bound search procedure to determine top combinations of items based on a specified metric and a specified number. According to one embodiment, the metric is value enabled and the specified number determines how many combinations of items are associated with the top combinations of items.01-22-2009
20090006396CONTEXTUAL SEARCH - A method of mobile communication advertising, having steps of entering one of a primary function keyword and a vanity keyword to a mobile communication device, entering a search term in a form of a message string into a data processing module, creating a list of category aliases, creating a list of category names, comparing the list of category aliases to the message string for a length, a category alias and a category name, identifying matches between the category aliases and the message string, wherein matches are placed into a search category list, removing matched category phrases from the search message to leave advertising search words, determining a search sub-module based upon the advertising search words, conducting a search using the sub-module; and sending a search result obtained from the sub-module to a mobile communication device, receiving the search result from the sub-module to a mobile communication device, and displaying the search result on the mobile communication device.01-01-2009
20100005096Document type identifying method and document type identifying apparatus - A document type identifying apparatus includes in advance a database storing therein keywords used as keys that identify document types in association with each document type. The document type identifying apparatus aligns word strings written on a document and generates partial keyword strings for each keyword by using the keywords stored in the database. The partial keyword strings are to be checked for matching with the word strings written on the document. Then, the document type identifying apparatus checks matching of the grouped and aligned word strings with the partial keyword strings and obtains, for each keyword, each number of matched words with the highest matching rates between the grouped word strings that are successfully matched and the partial keyword strings. Then, each number of matched words is used to calculate each evaluation value to determine the document type.01-07-2010
20100005095Method and Apparatus for Defining Data of lnterest - Some embodiments of the invention include tools for extracting data of interest from the world wide web (WWW). The extraction is accomplished using descriptions of data of interest. The descriptions of data of interest can include computer programs comprising a sequence of instructions and extractor patterns. The extractor patterns can be developed interactively using a web browser integrated into the graphical development environment for creating the descriptions of data of interest. The instructions can be selected from a predetermined list of instructions designed for extracting information from the WWW. The descriptions of data of interest can be grouped into categories sharing common query elements. Multiple descriptions of data of interest in the same category can executed simultaneously using the same query. The descriptions of data of interest can be accessed by a client computer using a web browser to initiate a query. In some embodiments, the descriptions of data of interest are used to provide information about products available for sale over the WWW.01-07-2010
20080243844INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD - Provided is an information search technique that can realize a short response time for a search request. To realize this object, a search client apparatus according to the invention is connected to a network system having a metadata search apparatus and a plurality of image search apparatuses, and comprises: a unit configured to transmit a search condition to the metadata search apparatus; a unit configured to group a plurality of image data, which has been obtained as a result of the search by the metadata search apparatus, in accordance with the number of image search apparatuses and allocate each of the image search apparatuses to each group; and a unit configured to transmit the search condition to each of the plurality of image search apparatuses and instruct to execute search processing on the allocated group of image data among the plurality of image data obtained by the search.10-02-2008
20090144279Method for improving search efficiency in enterprise search system - A search system with a search engine applies a user search query to an index of the documents stored in a document repository for returning a result set of matching documents to a user. In order to more efficiently access, search and retrieve documents stored in document repositories, one of a document repository and an index thereof or both are partitioned in one or more dimensions, and a partition is configured in a specific dimension according to two or more document attribute values selected from one and the same attribute category. This shall enable a search application to access significantly less data in order to determine a search result and shall specifically improve the efficiency of enterprise search systems in a high degree.06-04-2009
20090138469METHOD OF PATTERN SEARCHING - Structural join mechanisms provide efficient query pattern matching. In one embodiment, tree-merge mechanisms are provided. In another embodiment, stack-tree mechanisms are provided.05-28-2009
20090138470METHOD OF PATTERN SEARCHING - Structural join mechanisms provide efficient query pattern matching. In one embodiment, tree-merge mechanisms are provided. In another embodiment, stack-tree mechanisms are provided.05-28-2009
20080319995RELIABILITY OF DUPLICATE DOCUMENT DETECTION ALGORITHMS - In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold.12-25-2008
20080319994METHOD FOR REGISTERING A TEMPLATE MESSAGE, GENERATING AN UPDATE MESSAGE, REGENERATING AND PROVIDING AN APPLICATION REQUEST, COMPUTER ARRANGEMENT, COMPUTER PROGRAM AND COMPUTER PROGRAM PRODUCT - The invention relates to a method for registering a template message comprising identifying a request class containing a fixed and a variable content part, generating a template message comprising the fixed content part of the request class, and registering the template message for the request class with a template message database.12-25-2008
20090112861NON-WORD OR NON-NUMBER SEARCH - A system, method, and computer program product for performing a non-word or non-number search is provided. A search object designated by a user is stored. A user criterion selected by the user for the search object is registered. A search template representing the search object is constructed. The search template embodies a first determined value of the user criterion. The search template is associated with a stored template of a stored object, the stored template embodying a second determined value of the user criterion, by comparing the first determined value to the second determined value.04-30-2009
20080228765Genetic Attribute Analysis - A method and system for genetic attribute analysis are presented in which non-identical sets of genetic attributes comprising nucleotide sequence are compared to determine whether proteins encoded by those nucleotide sequences are functionally equivalent, and therefore whether genetic information contained in the sets of genetic attributes can be considered to be the same. A determination of equivalence between two sets of genetic attributes can enable the compression of thousands of individual DNA nucleotide attributes into a single categorical genetic attribute, which is useful for methods such as attribute discovery, predisposition prediction and predisposition modification where a reduction in the amount of genomic data can enhance processing efficiency. Sets of genetic attributes are determined to be equivalent based on whether they are able to satisfy one or more equivalence rules.09-18-2008
20080215582Generating Statistics on Text Pattern Matching Predicates for Access Planning - Statistics for a pattern matching predicate are generated using stored character statistics. A first structure stores, for each of a plurality of character positions, frequently occurring characters in that character position, and a count of the number of occurrences of that character. A second structure stores frequently occurring characters that are subsequent to the frequently occurring characters stored in the first structure, and a probability of occurrence of each frequently occurring subsequent character. To form an estimate of the number of tuples matching a pattern matching predicate, statistics are retrieved for the matching characters in each matching position in the predicate, and then combined to produce the estimate. In the event a statistic is not stored for a desired character, the available statistics are used to make an estimate by accumulating statistics for other characters, and then calculating average frequency of occurrence of characters that do not have stored statistics.09-04-2008
20080275874Supplier Deduplication Engine - Disclosed herein is a method of grouping similar supplier names together in a database. The syntactical errors in the supplier names are corrected. The supplier names are grouped after correcting the syntactical errors. The abbreviations in the supplier names are captured. The ordering, pronunciation and stemming errors in the supplier names are corrected. A matching algorithm that matches and compares two supplier names is applied that comprises the steps of grouping supplier names based on first set of characters in the supplier names and calculating a matching score between the two supplier using Levenshtein distance between the two supplier names, along with the supplier names' sound codes obtained from a modified metaphone algorithm, length of each word, position of matching and mismatching characters, and stem of words in the supplier names. The matching scores are compared with set thresholds in order to further group the supplier names into clusters.11-06-2008
20080270396INDEXING VERSIONED DOCUMENT SEQUENCES - A method includes indexing text is repeated in multiple edited versions of a document, a single time, thereby generating a compact index, and conducting text searches in the compact index.10-30-2008
20080270395Relevance Bar for Content Listings - A client computer receives a set of search results ordered based on scheduled time of broadcast associated with respective listings of content. The listings of content include data representing time-bounded events. The client computer displays or otherwise presents a set of objects along an axis of a display bar, the objects corresponding to subsets of the search results. For example, the axis is associated with a unit of time or a unit of relevance.10-30-2008
20080256070Data Collection Cataloguing and Searching Method and System - The present invention relates to a method of cataloguing a data structure and also preferably a method of searching through such as data structure to detect the presence of search patterns within the data structure. The catalogusing method of the present invention employs the formation of a catalogue data structure which is used to associate data items (transformed from data elements present within the data collection) with storage addresses. This catalogue data structure may be sorted to facilitate searching through same. Such searches may be completed through the formation of a plurality of search quries from a received search pattern sequence where the results of running these search queries may then be subsequently considered in conjunction with a search pattern sequence detection process.10-16-2008
20080256071Method And System For Selection Of Text For Editing - A method of selection of text for editing is provided. The method includes inputting text to an apparatus and generating a label for at least one unit of the text as the text is being input to the apparatus. Accordingly, a user is able to select the at least one text unit for editing by selecting the corresponding label of the text unit.10-16-2008
20080250016OPTIMIZED SMITH-WATERMAN SEARCH - An optimized database searching for a query sequence having a plurality of vectors arranged in a linear fashion, wherein the vectors are parallel to a query sequence, and a plurality of elements of the query sequence are reordered in a striped pattern, and wherein a set of dynamic programming scoring results are reported for further processing.10-09-2008
20090300014MEMBERSHIP CHECKING OF DIGITAL TEXT - The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.12-03-2009
20090300013Optimized Reverse Key Indexes - Aspects of the subject matter described herein relate to optimized reverse key indexes. In aspects, a dispersion function disperses index values such that they are distributed across multiple pages of an index. The dispersion function utilizes a dispersion factor that indicates to what extent the index values are dispersed. Because the index values are dispersed, contention regarding inserts may be reduced or eliminated and other advantages realized.12-03-2009
20090292702Acquisition and association of data indicative of an inferred mental state of an authoring user - A computationally implemented method includes, but is not limited to: acquiring data indicative of an inferred mental state of an authoring user; and associating the data indicative of the inferred mental state of the authoring user with an electronic message. In addition to the foregoing, other method aspects are described in the claims, drawings, and text forming a part of the present disclosure.11-26-2009
20090292700SYSTEM AND METHOD FOR SEMI-AUTOMATIC CREATION AND MAINTENANCE OF QUERY EXPANSION RULES - A system and method enable semi-automated generation of query expansion rules for searching a knowledge base. Candidate synonymy pairs are automatically extracted from queries made by users when searching a knowledge base. Synonymy rules are defined, based on the extracted candidate synonymy pairs, and may be context dependent. Query expansion rules based on the defined synonymy rules can then be exported to a storage medium for use in expansion of new user queries when searching the knowledge base.11-26-2009
20090292699Nucleotide and amino acid sequence compression - A biomolecular sequence database is encoded using a set of byte-aligned block codes. Some of the block codes encode a portion of a current sequence by pointing to an identical portion of another sequence. Others of the block codes are run length codes. Multiple different ways of encoding a current sequence using different ones of the block codes are determined. Dynamic programming is used to determine which one of these ways most efficiently encodes the current sequence into the shortest string of block codes. Each sequence in the database is encoded as such a string of block codes.11-26-2009
20100030779 SYSTEM AND METHOD FOR IDENTIFYING AND LINKING USERS HAVING MATCHING CONFIDENTIAL INFORMATION - A system and method for identifying and linking users having matching confidential information, the system comprising: a database for storing user information in a manner such that one or more data fields representing at least a portion of the user information are associated with respective confidentiality levels; a matching engine for identifying matching information in the data fields; and a linking unit for linking two or more users associated with matching information identified by the matching engine.02-04-2010
20100030778RETRIEVING AND SHARING ELECTRONIC DOCUMENTS USING PAPER - In an embodiment of the invention, an electronic document (e-document) can be searched and found by capturing an image of the printed document. Instead of typing in a file name or searching through multiple directories, the user simply takes a picture of the document with a camera and the system uses the document image to locate the e-document. In an alternative embodiment of the invention, an image of a printed document can be useful for remote document sharing. In various embodiments of the invention, sharing an image of a printed document can be used to email a high quality paper document, send a high quality fax, or open a document to a page containing an annotation. Through co-design of the feature extraction and search algorithm in the system, the image feature detection robustness and search speed are improved at the same time.02-04-2010
20100023517METHOD AND SYSTEM FOR EXTRACTING DATA-POINTS FROM A DATA FILE - The present invention provides a method, system and computer program product for extracting data-points from a data file. A data-point is extracted in a data-base by pointing at a portion of computer recognizable text in the data file by a pointing device. The data-point associated with the pointed portion of the computer recognizable text is thereby selected and stored in the database.01-28-2010
20100023516INFORMATION HANDLING - An information handling apparatus in which textual metadata is generated in respect of a current information item in an ensemble of information items including: a mechanism that detects one or more predetermined properties of the current information item; a mechanism that detects a subset of information items from the ensemble of information items the subset being those which have the one or more predetermined properties most similar to those of the current information item; and a mechanism that selects one or more most frequently occurring words and/or phrases within textual metadata associated with the subset of information items, for use in textual metadata to be associated with the current information item.01-28-2010
20100023515DATA CLUSTERING ENGINE - A method of managing a plurality of records, in which each record comprises a plurality of fields, involves determining a match signature for each record by evaluating a deterministic cluster definition against each record. The deterministic cluster definition comprises a logical association of the fields and defines at least one data cluster of the records. The data clusters are then identified by populating a match table with the match signatures. Each match signature is unique within the match table. Each record is associated with a respective one of the match signatures that are populated in the match table.01-28-2010
20100017405SYSTEM AND METHOD FOR IMPROVING NON-EXACT MATCHING SEARCH IN SERVICE REGISTRY SYSTEM WITH CUSTOM DICTIONARY - A system and associated method for searching a service registry system with a service name. The service registry system receives a request to search a service description with the service name. The service registry system first searches a registry for a service identifier that matches the service name in the request, and if there is no matching service identifier, the service registry system composes at least one candidate service name from synonyms of respective component words that are comprised by the service name. The service registry system performs another search in the registry with said composed candidate service name for all composed candidate service names and service descriptions associated with any match are returned.01-21-2010
20090119292PEER TO PEER TRAFFIC CONTROL METHOD AND SYSTEM - A system, apparatus, and method for controlling peer to peer traffic at a network gateway or server. Suspected peer to peer traffic is identified heuristically and collected for content analysis. Content digital fingerprint pattern matching software is received from a remote server. Peer to peer traffic is selectively disposed of.05-07-2009
20090083266TECHNIQUES FOR TOKENIZING URLS - Techniques are described for tokenizing a corpus of URLs of web documents. URLs are first tokenized based upon specified generic delimiters to form components. The components are then tokenized using website-specific delimiters. Website-specific delimiters are any non-alphanumerical symbol or a unit change that is specific to a particular website. Support for website-specific delimiters and the tokens resulting from website-specific delimiters are calculated. Support values for website-specific delimiters and the tokens above a specified threshold value are valid. Tokenization may also be performed by generating a graph of the corpus of URLs of web documents. Each node of the graph represents a token and each edge represents a delimiter of the URLs. The graph is traversed and the support of the edges are compared to a specified threshold value. If the support of an edge of a node is greater, then the token corresponding to the node is valid.03-26-2009
20090171953TECHNIQUES FOR RECOGNIZING MULTIPLE PATTERNS WITHIN A STRING - Techniques for recognizing multiple patterns within a string of characters are presented. A dictionary is hierarchically organized, such that leaf nodes within the dictionary represents words defined in the dictionary. A string of characters are received. Each character within the string is traversed by attempting to match it with a character defined in the dictionary. As long as a match continues with the dictionary the characters within the string are traversed. Once a longest possible match to a word within the dictionary is found, the next character following the last matched character for the string is processed.07-02-2009
20090083265COMPLEX REGULAR EXPRESSION CONSTRUCTION - A mechanism is provided to facilitate complex textual pattern matching. Regular expressions are specified utilizing a set of rules of various simplicity/complexity. These rules are subsequently employed to generate a more complex regular expression described by the rules, which can be passed to a regular expression engine to identity textual patterns as a function thereof.03-26-2009
20090100053Semantic matching using predicate-argument structure - The invention relates to topic classification systems in which text intervals are represented as proposition trees. Free-text queries and candidate responses are transformed into proposition trees, and a particular candidate response can be matched to a free-text query by transforming the proposition trees of the free-text query into the proposition trees of the candidate responses. Because proposition trees are able to capture semantic information of text intervals, the topic classification system accounts for the relative importance of topic words, for paraphrases and re-wordings, and for omissions and additions. Redundancy of two text intervals can also be identified.04-16-2009
20080294637Web-Based User-Interactive Question-Answering Method and System - The disclosed subject matter consists of a system, a website, and their supporting methods for user-interactive question answering. The system consists of a pattern database to store question/answer patterns for users to select when asking/answering questions. Each question pattern may include or be associated with an answer pattern. The system also includes an asking unit to let the users ask questions with or without question patterns and processes users' questions. The system also includes an answering unit to let the users answer questions with or without answer patterns. The invented method and system can significantly improve users' questioning and answering efficiency and also help machines improve the accuracy of processing the questions and answers and accumulate useful knowledge.11-27-2008
20090171958Computer-Based System and Method for Generating, Classifying, Searching, and Analyzing Standardized Text Templates and Deviations from Standardized Text Templates - A method for generating, classifying, searching, and analyzing standardized text templates drawn from a plurality of text documents and for identifying standardized text deviations from standardized text templates. Semi-standardized documents may be represented as standardized templates and deviations from standardized templates, with such templates themselves automatically generated by a computer-implemented method from a plurality of similar text documents. The method enables enhanced analysis of semi-standardized documents and automatic extraction of information from standardized text templates.07-02-2009
20090171956TEXT CATEGORIZATION WITH KNOWLEDGE TRANSFER FROM HETEROGENEOUS DATASETS - The present invention provides a method for incorporating features from heterogeneous auxiliary datasets into input text data for use in classification, a plurality of heterogeneous auxiliary datasets, such as labeled datasets and unlabeled datasets, are accessed after receiving input text data. A plurality of features are extracted from each of the plurality of heterogeneous auxiliary datasets. The plurality of features are combined with the input text data to generate a set of features which may potentially be used to classify the input text data. Classification features are then extracted from the set of features and used to classify the input text data. In one embodiment, the classification features are extracted by calculating a mutual information value associated with each feature in the set of features and identifying features having a mutual information value exceeding a threshold value.07-02-2009
20090030903AUTOMATED COLLATION CREATION - A collation creation process is provided to automatically establish collation support for sorted linguistic data. The sorted linguistic data is examined to determine if it matches an existing collation support. If not, a new collation support is created for the sorted linguistic data. The provider of the sorted linguistic data may participate in the collation creation process by answering queries concerning the sorted linguistic data. The provider's input is integrated into the sorted linguistic data before the collation creation process is applied to the sorted linguistic data. A user interface is provided that enables the interaction between the provider of the sorted linguistic data and the collation creation process. The user interface provides visual cues identifying distinctions among the strings in the sorted linguistic data.01-29-2009
20090300012MULTILEVEL INTENT ANALYSIS METHOD FOR EMAIL FILTRATION - A method for filtering email which contains links to uniform resource identifiers which disguise the content and identity of spam sites by multiple serial redirection.12-03-2009
20090070327METHOD FOR AUTOMATICALLY GENERATING REGULAR EXPRESSIONS FOR RELAXED MATCHING OF TEXT PATTERNS - A method for automatically generating regular expressions for relaxed matching of text patterns. A received input phrase expressed in a natural language is determined to be a plain text pattern. The plain text pattern is automatically tokenized, thereby generating a first token list. Rules loaded from a predefined rule set are automatically applied to the first token list in an order specified by the predefined rule set to automatically modify a token list by applying a replace word, split-at-character or whitespace operator. The modified token list is automatically converted into a regular expression that matches the plain text pattern and one or more variations of the plain text pattern. A utilization of the regular expression for an information extraction facilitates a recall and a precision of the information extraction.03-12-2009
20080281816Dynamic Keyword Processing System and Method For User Oriented Internet Navigation - A system and method are described that enable users to navigate on the web according to use's own keyword definition on web site, keyword extraction and processing from user's visiting website, user's selection on keyword categories, and mapping between E-mail address and URL. The user's own keyword definition on web site is user-driven keyword naming scheme is opposite) method of the keyword domain services which were service company-driven method. The user's selection on keyword categories provides users choice on keyword categories and group. The keyword extraction and processing from user's visiting web site provides keyword extraction from the page and arranges for related keywords in order to prepare for anticipated search and navigation from the user's current web site and keyword. The mapping system between E-mail and URL provides conversion of E-mail address into URL, in order to use as domain name.11-13-2008
20090171954FREQUENT PATTERN ARRAY - Machine readable media, methods, and computing devices are disclosed that mine a dataset using a frequent pattern array. One method of includes building a frequent pattern tree comprising a plurality of nodes to represent frequent transactions of a dataset that comprises one or more items. The method also includes transforming the frequent pattern tree to a frequent pattern array that comprises an item array and a plurality of node arrays, the item array comprising frequent transactions of the dataset and each node array to associate an item of the dataset with one or more frequent transactions of the item array. The method further includes identifying frequent transactions of the dataset based upon the frequent pattern array.07-02-2009
20090132532PROCEDURE GENERATION APPARATUS AND METHOD - A procedure generation apparatus has, in a storage unit thereof, a database in which a name of input information and a name of output information name are stored, associated with a name of a work. The procedure generation apparatus retrieves one or more candidate work names associated with an input information name from the database, displays the retrieved one or more work names, receives a selection of a work name from among the displayed one or more work names, retrieves one or more candidate output information names associated with the selected work name from the database, displays the retrieved one or more output information names, receives a selection of an output information name from among the displayed output information names, retrieves one or more candidate input information names each having a similar name to the selected output information name, from the database, and displays the retrieved input information name.05-21-2009
20090132531SEQUENTIAL PATTERN EXTRACTING APPARATUS - Constraining sequential data expressing sequential data which a sequential pattern to be extracted must include is specified in advance. Sequential pattern candidates with sequence length 05-21-2009
20090132530WEB CONTENT MINING OF PAIR-BASED DATA - Described herein is technology for, among other things, mining pair-based data on the web. The technology involves an online pair-based data mining system as well as an offline SVM training system. By subjecting a pair-based input data to the systems, one may grow a pool of pair-based data which share characteristics of the pair-based input data in more efficient manner.05-21-2009
20100049713PATTERN MATCHING DEVICE AND METHOD - Provided is a pattern matching device comprising memories. On each of the combinations of the values of an N number (N: a natural number) of pattern detection signals outputted from a circuited NFA (Non-deterministic Finite Automaton), the memories store both identifiers indicating patterns corresponding to effective patterns of the N number of pattern detection signals and flags indicating the definitions of the combinations, individually in addresses set according to the combinations. Further comprised are an address creating unit for determining the address of the memory corresponding to the combination of the values of the pattern detection signals, by using the combination of the values of the pattern detection signals outputted from the circuited NFA, and a read control unit for reading the identifiers and the flags stored in the address from the memories while incrementing the addresses determined by the address creating unit, until the flags take a specific value.02-25-2010
20100049712SEARCH METHOD AND SEARCH PROGRAM - A search device creates as many stack frames as the number obtained by adding one to the number of search condition character strings contained in an out-of-search-condition character string in a stack, sequentially inputs character strings in a text into automaton data, determines whether the character strings in the text hit the search condition character string or the out-of-search-condition character string to push correspondence to the stack or to change correspondence into non-correspondence, and determines whether the text is to be searched.02-25-2010
20100049709Generating Succinct Titles for Web URLs - Methods, computer programs, and systems for generating a link title for a URL (Uniform Resource Locator) within a context webpage to be shown as a web result are provided. The method evaluates generation parameters for a plurality of sources for picking words from the link title. Further, the method generates candidates for the link title, and a likelihood is computed for each candidate. When computing the likelihood, the generation parameters, the context webpage and the words are considered. In addition, the method selects a candidate with the highest likelihood from all the computed likelihoods, and presents the URL with the selected candidate as the title.02-25-2010
20100017406DOCUMENT PROCESSING DEVICE AND PROGRAM - A switching information acquiring unit 01-21-2010
20090089287Automatically verifying that anti-phishing URL signatures do not fire on legitimate web sites - A method and computer program product prevent false positives from occurring by reducing or preventing legitimate web site content from triggering matches to phishing black lists, but provides time and cost savings over manual review of black lists. A method implemented in a computer system for detecting false positives among a plurality of search patterns of web sites that include illegitimate content comprises accessing a first page of a legitimate web site, obtaining all links included in the first page, for each link included in the first page that points to a page on the web site, determining whether the link matches at least one of the plurality of search patterns, and for each link that matches the search pattern, indicating that the search pattern is a false positive.04-02-2009
20100010993Distributed personal information aggregator - A method of aggregating personal information available from public sources over a network. The method includes the steps of receiving at a computer server, data associated with a person, the data being publicly available over a network, and including at least a first name and a last name; using a processor to compare the received data to a plurality of data profiles stored in a database of one or more memory devices, each profile corresponding to a previously-profiled person and containing data associated with the previously-profiled person; determining whether the received data sufficiently matches data associated with the previously-profiled person of the data profile; and merging the received data with the data associated with the previously-profiled person.01-14-2010
20100010995METHODS OF CODING AND DECODING, BY REFERENCING, VALUES IN A STRUCTURED DOCUMENT, AND ASSOCIATED SYSTEMS - The present invention concerns a method of coding an XML-type structured document, a corresponding decoding method and associated systems.01-14-2010
20100010994MOBILE APPLICATION DISCOVERY THROUGH MOBILE SEARCH - A social mobile network enables discovery of application programs running on the mobile devices. A search for partial or full matches to a group of alphanumeric characters is performed on the data stored on the first mobile communication device on which the search is initiated. The search is also performed on data made available to the user of the first mobile communication device by other users, where each user is associated with a different one of a multitude of mobile communication devices. The sharing of the data and the search for shared data is made via a server with which the mobile communication devices are in communication. The discovery of applications whose names or descriptions are partially or fully matched to the alphanumeric characters is made despite the fact that user was not looking for or aware of the existence of such application programs.01-14-2010
20100010992Methods And Systems For Resolving A Location Information To A Network Identifier - Methods and systems are described for resolving location information to a network identifier. In one embodiment, a method includes receiving information identifying a geospatial query region. The method also includes generating a query message including an outside-scope, unicast identifier identifying a zone corresponding to a zone region at least partially present in the query region. The method also includes sending the query message to a border node having an outside network interface for receiving the query message and an inside network interface in a network path including a network interface in the zone. The method also includes receiving a response identifying a node having a network interface in the zone.01-14-2010
20090019045SYNTHESZING INFORMATION-BEARING CONTENT FROM MULTIPLE CHANNELS - A computing system and method receive a query; separate a plurality of information sources into individual elements of content (EOC); tag each EOC with metadata that indicate source, date, and other relevant information; pattern match each EOC; calculate the respective distance function from every EOC to every other EOC; and output EOC to a set of virtual buffers (01-15-2009
20090019044PATTERN SEARCH APPARATUS AND METHOD THEREOF - A pattern search apparatus includes a storage unit, a distribution acquisition unit, a hash function unit, a training unit, and a search unit. A cumulative probability distribution of the training pattern on an arbitrary axis is obtained, and hash function each of which divides a probability value are defined based on the cumulative probability distribution.01-15-2009
20090019043DATA MINING METHOD FOR FINDING DEVIATIONS IN DATA - Methods and apparatus, including computer program products, implementing and using techniques for finding deviations in data. A set of candidate patterns is generated. A set of exception patterns that occur in the data less frequently than expected assuming statistical independence is selected from the set of candidate patterns. Data records that comply with at least one of the exception patterns are processed as exception candidates.01-15-2009
20090019042Method Of Screening Compound Regulating The Translation Of Specific mRNA - The present invention provides a screening method for a compound capable of preventing/treating a particular disease, which comprises (1) a step for selecting one or more mRNAs whose regulation of translation can result in the prevention/treatment of the disease, (2) a step for searching for mRNA(s) having a sequence, in the molecule, capable of assuming a local secondary structure capable of regulating the translation of the mRNA(s) from among the mRNAs selected in (1) above, (3) a step for selecting a particular sequence from among sequences capable of assuming a local secondary structure in the particular mRNA extracted in (2) above, (4) a step for confirming that a partial structure capable of interacting with the compound in the particular sequence selected in (3) above is not present in the region involved in the regulation of the translation of other mRNA, or even if present, is incapable of regulating the translation, and (5) a step for bringing an RNA strand comprising a particular sequence confirmed to be not present in the region involved in the regulation of the translation of other mRNA in (4) above and a test compound into contact with each other, and measuring changes in the stability of the secondary structure of the RNA strand.01-15-2009
20080301134SYSTEM AND METHOD FOR ACCELERATING ANCHOR POINT DETECTION - A sampling based technique for eliminating duplicate data (de-duplication) stored on storage resources, is provided. According to the invention, when a new data set, e.g., a backup data stream, is received by a server, e.g., a storage system or virtual tape library (VTL) system implementing the invention, one or more anchors are identified within the new data set. The anchors are identified using a novel anchor detection circuitry in accordance with an illustrative embodiment of the present invention. Upon receipt of the new data set by, for example, a network adapter of a VTL system, the data set is transferred using direct memory access (DMA) operations to a memory associated with an anchor detection hardware card that is operatively interconnected with the storage system. The anchor detection hardware card may be implemented as, for example, a FPGA is to quickly identify anchors within the data set. As the anchor detection process is performed using a hardware assist, the load on a main processor of the system is reduced, thereby enabling line speed de-duplication.12-04-2008
20080301133LOCATION RECOGNITION USING INFORMATIVE FEATURE VOCABULARY TREES - A location recognition technique that involves using a query image to identify a depicted location is presented. In addition to the query image, there is also a pre-constructed database of features which are associated with images of known locations. The technique matches features derived from the query image to the database features using a specialized vocabulary tree, which is referred to as an informative feature vocabulary tree. The informative feature vocabulary tree is specialized because it was generated using just those database features that have been deemed informative of known locations. The aforementioned matching features are used to identify a known location image that matches the query image. The location associated with that known location image is then deemed to be the location depicted in the query image.12-04-2008
20080301135EVENT PROCESSING QUERY LANGUAGE USING PATTERN MATCHING - An event processor can use event processing queries to operate on an event stream. Event processing queries can include a “matching” function that matches a pattern in the event stream.12-04-2008
20080250018Binary function database system - A binary function database system is provided in which binary functions are extracted from compiled and linked program files and stored in a database as robust abstractions which can be matched with others using one or more function matching heuristics. Such abstraction allows for minor variations in function implementation while still enabling matching with an identical stored function in the database, or with a stored function with a given level of confidence. Metadata associated with each function is also typically generated and stored in the database. In an illustrative example, a structured query language database is utilized that runs on a central database server, and that tracks function names, the program file from which the function is extracted, comments and other associated information as metadata during an analyst's live analysis session to enable known function information that is stored in the database to be applied to binary functions of interest that are disassembled from the program file.10-09-2008
20080243843Predisposition Modification Using Co-associating Bioattributes - A bioinformatics method, software, database and system are presented in which attributes that modify an individual's predisposition for association with a query attribute (i.e., an attribute of interest) are identified. A minimum strength of association value serves as a statistical threshold to ensure the results will provide at least a minimum degree of certainty that the individual will acquire an association with the query attribute upon modifying their attribute profile with the identified attributes.10-02-2008
20080235227SYSTEMS AND METHODS TO EXTRACT DATA AUTOMATICALLY FROM A COMPOSITE ELECTRONIC DOCUMENT - A system and method for automatically extracting contract data from electronic contracts includes an administrator module configured to provide templates for inputting document patterns and a list of contract data tags for each of a plurality of contract document types. A parser is configured to convert an electronic contract document into a contract text document and reformat the contract text document to provide a pattern for the text contract document. A pattern recognition engine is configured to determine a list of contract document types in the electronic contract by comparing and matching patterns of all known contract document types with the pattern of the contract text document. A contract data extraction engine is configured to extract contract data for each contract document type on the list.09-25-2008
20080228766Efficiently Compiling Co-associating Attributes - A method, software, database and system are presented in which attribute profiles of query-attribute-positive individuals and query-attribute-negative individuals are compared, and combinations of attributes that occur at a higher frequency in the group of query-attribute-positive individuals are identified and stored to generate a compilation of attribute combinations that co-associate with the query attribute (i.e., an attribute of interest). Several computationally efficient approaches for identifying the attribute combinations are incorporated.09-18-2008
20080222147WORKFLOW SYSTEM MATRIX ORGANIZATION SEARCH ENGINE - A rule-based search engine is used in conjunction with an automated network-based workflow system (which in turn is interfaced with an organizational database) to efficiently determine service routing requests from users/clients. The search engine employs search techniques adapted for use with multi-dimensional tree structures that define the matrix organizational model. Workflow services are preferably represented by roles that can be used to represent workflow actors in the workflow routing rules. These roles are preferably evaluated at run-time to best match recipients depending on the organization context from which the routing request is made.09-11-2008
20100057735FRAMEWORK FOR SUPPORTING REGULAR EXPRESSION-BASED PATTERN MATCHING IN DATA STREAMS - Techniques for detecting patterns in one or more data or event streams. A pattern to be detected may be specified using a regular expression. Events received in a data stream are processed during runtime to detect occurrences of the specified pattern in the data stream. In one embodiment, a pattern type or class is determined for the specified pattern and pattern matching is performed using a technique selected based upon the type or class determined for the specified pattern.03-04-2010
20080208855METHOD FOR MAPPING A DATA SOURCE TO A DATA TARGET - The invention relates to a method for mapping at least one data column from a database source to at least one data column of a data target, the method comprising: defining at least one reference column of the data target and at least one database source column; performing a comparison of data contained in the data column(s) with the reference column(s); and determining mapping candidates between the data column(s) and the reference column(s).08-28-2008
20080201329Method And The Associate Mechanism For Stored-Image Database-Driven Spectacle Frame Fitting Services Over Public Network - A method of spectacle frame fitting over public network, such as Internet, based upon database of product information and digitized user images as acquired via devices connected to computer. Particularly, consumers can take advantage of present method to choose spectacle frames from wide variety of selections, expeditiously by the use of public computer network (Internet). Consumers may use digital cameras, network cameras or scanned photos to submit facial image, and by way of calibration steps such as the gap between two pupils aligned to the marked pupil point of the stored spectacle images, the suitable size of the spectacle frame can then be determined for best fit to the facial image. With the accessibility and availability of the Internet, just a few clicks on the mouse enable the consumers to choose spectacle frames of their preference and in a way that affords wide selections at low costs and easy access.08-21-2008
20090094237Methods, Systems, and Computer Program Products for Generating Data Quality Indicators for Relationships in a Database - The disclosed methods, systems, and computer-program products allow a business to generate data quality indicators for relationships in a database. In an embodiment, one or more relationships linked to a customer are retrieved from a database to form a set of relationships. A match confidence code is generated for each relationship based on a score generated by the comparison of customer data associated with the respective relationship and corresponding customer data obtained from an external industry database. A link confidence code is subsequently determined for the customer based on a score generated by the scores used to define the match confidence code for each relationship in the set of relationships and on internal data associated with each relationship in the set of relationships. The link confidence code for the customer and the match confidence codes and the respective scores for the set of relationships may be provided to an end user of the database in order to improve decisions made by the end user at the customer level.04-09-2009

Patent applications in class Pattern matching access