SEARCH    

            ADVANCED    

            WIDGETS    

            SYNTAX    
 

            USERS    

            ABOUT    

            BLOG    

            FAQ    

            API    

ADMIN    









Basic Query Syntax
Example QueryDescription
cat dog Search results have the word cat and the word dog in them. They could also have cats and dogs.
+cat Search results have the word cat in them. If the search results has the word cats then it will not be included. The plus sign indicates an exact match and not to use synonyms, hypernyms or hyponyms or any other form of the word.
mp3 "take five" Search results have the word mp3 and the exact phrase take five in them.
"john smith" -"bob dole" Search results have the phrase john smith but NOT the phrase bob dole in them.
bmx -game Search results have the word bmx but not game.
cat | dogMatch documents that have cat and dog in them, but do not allow cat to affect the ranking score, only dog. This is called a query refinement.
document.title:paperThat query will match a JSON document like { "document":{"title":"This is a good paper." }} or, alternatively, an XML document like <document><title>This is a good paper</title></document>



Advanced Query Operators
Example QueryDescription
gbfieldmatch:strings.vendor:"My Vendor Inc."Matches all the meta tag or JSON or XML fields that have the name "strings.vendor" and contain the exactly provided value, in this case, My Vendor Inc.. This is CASE SENSITIVE and includes punctuation, so it's exact match. In general, it should be a very short termlist, so it should be fast.
url:www.abc.com/page.htmlMatches the page with that exact url. Uses the first url, not the url it redirects to, if any.
ext:docMatch documents whose url ends in the .doc file extension.
link:www.gigablast.com/foo.htmlMatches all the documents that have a link to http://www.gigablast.com/foobar.html
sitelink:abc.foobar.comMatches all documents that link to any page on the abc.foobar.com site.
site:mysite.comMatches all documents on the mysite.com domain.
site:www.mysite.com/dir1/dir2/Matches all documents whose url starts with www.mysite.com/dir1/dir2/
ip:1.2.3.4Matches all documents whose IP is 1.2.3.4.
ip:1.2.3Matches all documents whose IP STARTS with 1.2.3.
inurl:dogMatches all documents that have the word dog in their url, like http://www.mysite.com/dog/food.html. However will not match http://www.mysite.com/dogfood.html because it is not an individual word. It must be delineated by punctuation.
suburl:dogSame as inurl.
title:catMatches all the documents that have the word cat in their title.
title:"cat food"Matches all the documents that have the phrase "cat food" in their title.
title:catSame as intitle:
gbinrss:1Matches all documents that are in RSS feeds. Likewise, use gbinrss:0 to match all documents that are NOT in RSS feeds.
type:jsonMatches all documents that are in JSON format. Other possible types include html, text, xml, pdf, doc, xls, ppt, ps, css, json, status. status matches special documents that are stored every time a url is spidered so you can see all the spider attempts and when they occurred as well as the outcome.
filetype:jsonSame as type: above.
gbisadult:1Matches all documents that have been detected as adult documents and may be unsuitable for children. Likewise, use gbisadult:0 to match all documents that were NOT detected as adult documents.
gbimage:site.com/image.jpgMatches all documents that contain the specified image.
gbhasthumbnail:1Matches all documents for which Gigablast detected a thumbnail. Likewise use gbhasthumbnail:0 to match all documents that do not have thumbnails.
gbtag*Matches all documents whose tag named * have the specified value in the tagdb entry for the url. Example: gbtagsitenuminlinks:2 matches all documents that have 2 qualified inlinks pointing to their site based on the tagdb record. You can also provide your own tags in addition to the tags already present. See the tagdb menu for more information.
gbzip:90210Matches all documents that have the specified zip code in their meta zip code tag.
gbcharset:windows-1252Matches all documents originally in the Windows-1252 charset. Available character sets are listed in the iana_charset.cpp file in the open source distribution. There are a lot. Some more popular ones are: us, latin1, iso-8859-1, csascii, ascii, latin2, latin3, latin4, greek, utf-8, shift_jis.
gblang:deMatches all documents in german. The supported language abbreviations are at the bottom of the url filters page. Some more common ones are gblang:en, gblang:es, gblang:fr, gblang:"zh_cn" (note the quotes for zh_cn!).
gbpathdepth:3Matches all documents whose url has 3 path components to it like http://somedomain.com/dir1/dir2/dir3/foo.html
gbhopcount:2Matches all documents that are a minimum of two link hops away from a root url.
gbhasfilename:1Matches all documents whose url ends in a filename like http://somedomain.com/dir1/myfile and not http://somedomain.com/dir1/dir2/. Likewise, use gbhasfilename:0 to match all the documents that do not have a filename in their url.
gbiscgi:1Matches all documents that have a question mark in their url. Likewise gbiscgi:0 matches all documents that do not.
gbhasext:1Matches all documents that have a file extension in their url. Likewise, gbhasext:0 matches all documents that do not have a file extension in their url.
gbsubmiturl:domain.com/process.phpMatches all documents that have a form that submits to the specified url.
gbparenturl:www.xyz.com/abc.htmlDiffbot only. Match the json urls that were extract from this parent url. Example: gbparenturl:www.gigablast.com/addurl.htm
gbcountry:usMatches documents determined by Gigablast to be from the United States. See the country abbreviations in the CountryCode.cpp open source distribution. Some more popular examples include: de, fr, uk, ca, cn.
gbpermalink:1Matches documents that are permalinks. Use gbpermalink:0 to match documents that are NOT permalinks.
gbdocid:123456Matches the document with the docid 123456



Numeric Field Query Operators
Example QueryDescription
cameras gbsortbyfloat:priceSort all documents that contain 'camera' by price. price can be a root JSON field or in a meta tag, or in an xml <price> tag.
cameras gbsortbyfloat:product.priceSort all documents that contain 'camera' by price. price can be in a JSON document like { "product":{"price":1500.00}} or, alternatively, an XML document like <product><price>1500.00</price></product>
cameras gbrevsortbyfloat:product.priceLike above example but sorted with highest prices on top.
pilots gbsortbyint:employeesSort all documents that contain 'pilots' by employees. employees can be a root JSON field or in a meta tag, or in an xml <price> tag. The value it contains is interpreted as a 32-bit integer.
gbsortbyint:gbdocspiderdateSort all documents by the date they were spidered/downloaded.
gbsortbyint:company.employeesSort all documents by employees. Documents can contain employees in a JSON document like { "product":{"price":1500.00}} or, alternatively, an XML document like <product><price>1500.00</price></product>
gbsortbyint:gbsitenuminlinksSort all documents by the number of distinct inlinks the document's site has.
gbrevsortbyint:gbdocspiderdateSort all documents by the date they were spidered/downloaded but with the oldest on top.
cameras gbminfloat:price:109.99Matches all documents that contain 'camera' or 'cameras' and have a price of at least 109.99. price can be a root JSON field or in a meta tag name price, or in an xml <price> tag.
cameras gbminfloat:product.price:109.99Matches all documents that contain 'camera' or 'cameras' and have a price of at least 109.99 in a JSON document like { "product":{"price":1500.00}} or, alternatively, an XML document like <product><price>1500.00</price></product>
cameras gbmaxfloat:price:109.99Like the gbminfloat examples above, but is an upper bound.
gbequalfloat:product.price:1.23Similar to gbminfloat and gbmaxfloat but is an equality constraint.
gbminint:gbspiderdate:1391749680Matches all documents with a spider timestamp of at least 1391749680. Use this as opposed th gbminfloat when you need 32 bits of integer precision.
gbmaxint:company.employees:20Matches all companies with 20 or less employees in a JSON document like { "company":{"employees":13}} or, alternatively, an XML document like <company><employees>13</employees></company>
gbequalint:company.employees:13Similar to gbminint and gbmaxint but is an equality constraint.



Date Related Query Operators
Example QueryDescription
gbdocspiderdate:1400081479Matches documents that have that spider date timestamp (UTC). This is the time the document completed downloading.
gbspiderdate:1400081479Like above.
gbdocindexdate:1400081479Like above, but is the time the document was last indexed. This time is slightly greater than or equal to the spider date.
gbindexdate:1400081479Like above.



Facet Related Query Operators
Example QueryDescription
gbfacetstr:colorReturns facets in the search results by their color field. color is case INsensitive.
gbfacetstr:product.colorReturns facets in the color field in a JSON document like { "product":{"color":"red"}} or, alternatively, an XML document like <product><color>red</price></product>. product.color is case INsensitive.
gbfacetstr:gbtagsite catReturns facets from the site names of all pages that contain the word 'cat' or 'cats', etc. gbtagsite is case insensitive.
gbfacetint:product.coresReturns facets in of the cores field in a JSON document like { "product":{"cores":10}} or, alternatively, an XML document like <product><cores>10</price></product>. product.cores is case INsensitive.
gbfacetint:gbhopcountReturns facets in of the gbhopcount field over the documents so you can search the distribution of hopcounts over the index. gbhopcount is case INsensitive.
gbfacetint:gbtagsitenuminlinksReturns facets in of the sitenuminlinks field for the tag sitenuminlinksin the tag for each site. Any numeric tag in tagdb can be facetizeed in this manner so you can add your own facets this way on a per site or per url basis by making tagdb entries. Case Insensitive.
gbfacetint:size,0-10,10-20,30-100,100-200,200-1000,1000-10000Returns facets in of the size field (either in json, field or a meta tag) and cluster the results into the specified ranges. size is case INsensitive.
gbfacetint:gbsitenuminlinksReturns facets based on # of site inlinks the site of each result has. gbsitenuminlinks is case INsensitive.
gbfacetfloat:product.weightReturns facets of the weight field in a JSON document like { "product":{"weight":1.45}} or, alternatively, an XML document like <product><weight>1.45</price></product>. product.weight is case INsensitive.
gbfacetfloat:product.price,0-1.5,1.5-5,5.0-20,20-100.0Similar to above but cluster the pricess into the specified ranges. product.price is case insensitive.



Spider Status Documents
Example QueryDescription
gbssUrl:comQuery the url of a spider status document.
gbssFinalRedirectUrl:abc.com/page2.htmlQuery on the last url redirect to, if any.
gbssStatusCode:0Query on the status code of the index attempt. 0 means no error.
gbssStatusMsg:"Tcp timed"Like gbssStatusCode but a textual representation.
gbssHttpStatus:200Query on the HTTP status returned from the web server.
gbssWasIndexed:0Was the document in the index before attempting to index? Use 0 or 1 to find all documents that were not or were, respectively.
gbssIsDiffbotObject:1This field is only present if the document was an object from a diffbot reply. Use gbssIsDiffbotObject:0 to find the non-diffbot objects.
gbsortby:gbssAgeInIndexIf the document was in the index at the time we attempted to reindex it, how long has it been since it was last indexed?
gbssDomain:yahoo.comQuery on the domain of the url.
gbssSubdomain:www.yahoo.comQuery on the subdomain of the url.
gbfacetint:gbssNumRedirectsQuery on the number of times the url redirect when attempting to index it.
gbssDocId:1234567Show all the spider status docs for the document with this docId.
gbfacetint:gbssHopCountQuery on the hop count of the document.
gbfacetint:gbssCrawlRoundQuery on the crawl round number.
gbssDupOfDocId:123456Show all the documents that were considered dups of this docId.
gbssPrevTotalNumIndexAttempts:1Before this index attempt, how many attempts were there?
gbssPrevTotalNumIndexSuccesses:1Before this index attempt, how many successful attempts were there?
gbssPrevTotalNumIndexFailures:1Before this index attempt, how many failed attempts were there?
gbrevsortbyint:gbssFirsIndexedThe date in utc that the document was first indexed.
gbfacetint:gbssContentHash32The hash of the document content, excluding dates and times. Used internally for deduping.
gbsortbyint:gbssDownloadDurationMSHow long it took in millisecons to download the document.
gbsortbyint:gbssDownloadStartTimeWhen the download started, in seconds since the epoch, UTC.
gbsortbyint:gbssDownloadEndTimeWhen the download ended, in seconds since the epoch, UTC.
gbfacetint:gbssUsedRobotsTxtThis is 0 or 1 depending on if robots.txt was not obeyed or obeyed, respectively.
gbfacetint:gbssConsecutiveErrorsFor the last set of indexing attempts how many were errors?
gbssIp:1.2.3.4The IP address of the document being indexed. Is 0.0.0.0 if unknown.
gbsortby:gbssIpLookupTimeMSHow long it took to lookup the IP of the document. Might have been in the cache.
gbsortby:gbssSiteNumInlinksHow many good inlinks the document's site had.
gbsortby:gbssSiteRankThe site rank of the document. Based directly on the number of inlinks the site had.
gbfacetint:gbssContentInjectedThis is 0 or 1 if the content was not injected or injected, respectively.
gbfacetfloat:gbssPercentContentChangedA float between 0 and 100, inclusive. Represents how much the document has changed since the last time we indexed it. This is only valid if the document was successfully indexed this time.respectively.
gbfacetint:gbssSpiderPriorityThe spider priority, from 0 to 127, inclusive, of the document according to the url filters table.
gbfacetstr:gbssMatchingUrlFilterThe url filter expression the document matched.
gbfacetstr:gbssLanguageThe language of the document. If document was empty or not downloaded then this will not be present. Uses xx to mean unknown language. Uses the language abbreviations found at the bottom of the url filters page.
gbfacetstr:gbssContentTypeThe content type of the document. Like html, xml, json, pdf, etc. This field is not present if unknown.
gbsortbyint:gbssContentLenThe content length of the document. 0 if empty or not downloaded.
gbfacetint:gbssCrawlDelayThe crawl delay according to the robots.txt of the document. This is -1 if not specified in the robots.txt or not found.
gbssSentToDiffbotThisTime:1Was the document's url sent to diffbot for processing this time of spidering the url?
gbssSentToDiffbotAtSomeTime:1Was the document's url sent to diffbot for processing, either this time or some time before?
gbssDiffbotReplyCode:0The reply received from diffbot. 0 means success, otherwise, it indicates an error code.
gbfacetstr:gbssDiffbotReplyMsg:0The reply received from diffbot represented in text.
gbsortbyint:gbssDiffbotReplyLenThe length of the reply received from diffbot.
gbsortbyint:gbssDiffbotReplyResponseTimeMSThe time in milliseconds it took to get a reply from diffbot.
gbfacetint:gbssDiffbotReplyRetriesThe number of times we had to resend the request to diffbot because diffbot returned a 504 gateway timed out error.
gbfacetint:gbssDiffbotReplyNumObjectsThe number of JSON objects diffbot excavated from the provided url.



Boolean Queries
Example Query Description
Note: boolean operators must be in UPPER CASE.
cat AND dog Search results have the word cat AND the word dog in them.
cat OR dog Search results have the word cat OR the word dog in them, but preference is given to results that have both words.
cat dog OR pig Search results have the two words cat and dog OR search results have the word pig, but preference is given to results that have all three words. This illustrates how the individual words of one operand are all required for that operand to be true.
"cat dog" OR pig Search results have the phrase "cat dog" in them OR they have the word pig, but preference is given to results that have both.
title:"cat dog" OR pig Search results have the phrase "cat dog" in their title OR they have the word pig, but preference is given to results that have both.
cat OR dog OR pig Search results need only have one word, cat or dog or pig, but preference is given to results that have the most of the words.
cat OR dog AND pig Search results have dog and pig, but they may or may not have cat. Preference is given to results that have all three. To evaluate expressions with more than two operands, as in this case where we have three, you can divide the expression up into sub-expressions that consist of only one operator each. In this case we would have the following two sub-expressions: cat OR dog and dog AND pig. Then, for the original expression to be true, at least one of the sub-expressions that have an OR operator must be true, and, in addition, all of the sub-expressions that have AND operators must be true. Using this logic you can evaluate expressions with more than one boolean operator.
cat AND NOT dog Search results have cat but do not have dog.
cat AND NOT (dog OR pig) Search results have cat but do not have dog and do not have pig. When evaluating a boolean expression that contains ()'s you can evaluate the sub-expression in the ()'s first. So if a document has dog or it has pig or it has both, then the expression, (dog OR pig) would be true. So you could, in this case, substitute true for that expression to get the following: cat AND NOT (true) = cat AND false = false. Does anyone actually read this far?
(cat OR dog) AND NOT (cat AND dog) Search results have cat or dog but not both.
left-operand  OPERATOR  right-operand This is the general format of a boolean expression. The possible operators are: OR and AND. The operands can themselves be boolean expressions and can be optionally enclosed in parentheses. A NOT operator can optionally preceed the left or the right operand.