2016/03/31 KNApSAcK dataset was updated to the version on 2015.3.17.
2016/02/26 HMDB dataset was updated to the version 3.6 (release 2016.2.21). LipidMAPS dataset was updated to the version 28Jun15.
2013/06/01 KNApSAcK data was updated. Notice that the format of the KNAPSAcK ID returned as search results was changed as below.
Before: [Formula]_[KNApSAcK ID] or [Formula]-[KNApSAcK ID]
After: KNApSAcK ID
2013/05/14 Data for the Human Metabolome Database (HMDB) was updated to version 3.5.
All the query for MFSearcher is executed as URL access to the MFSearcher server.
Ex.) Click on the following link, or copy & paste the URL into the address field on your browser and press return key to execute the query.
As shown here, the URL is constructed by query information, namely, the target database to search, the searching method, the parameters and their values, and by the symbols connecting them. The results will be returned as text data as follows.
ExactMassDB C16H11O2N10P3S1 21.0 500.0000046695
ExactMassDB C32H1N6P1 37.0 500.0000320763
ExactMassDB C6H37O5P9S1 2.0 500.0000372918
ExactMassDB C19H17O10P1S2 13.0 500.0000767653
Help pages are appeared when the URL are ended with slash ("/") for the top page (this page) and for each database.
The list of the atomic weights used for the calculation of accurate mass values in the MFSearcher is available from the URL below.
The list is originated from the following paper.
De Laeter JR, Bohlke JK, De Bievre P, Hidaka H, Peiser HS, Rosman KJR, Taylor PDP (2003) Atomic weights of the elements: Review 2000 (IUPAC technical report). Pure Appl Chem 75: 683-800
All the mass values in the MFSearcher database are calculated with the list, therefore, the values could be different from the original data in KEGG, PubChem, and so on.
The databases available in MFSearcher are listed below. A query against a database should be written down following to the base URL of the database. Detailed descriptions of the databases are available by just accessing to the base URLs.
|ExactMassDB||A database of possible elemental compositions consits of C: 100, H: 200, O: 50, N: 10, P: 10, and S: 10, that satisfy the Senior and the Lewis valence rules.||/mfsearcher/exmassdb/|
|ExactMassDB-HR2||HR2, which is one of the fastest tools for calculation of elemental compositions, filters some elemental compositions according to the Seven Golden Rules (Kind and Fiehn, 2007). The ExactMassDB-HR2 database returns the same result as does HR2 with the same atom kind and number condition as that used in construction of the ExactMassDB.||/mfsearcher/exmassdb-hr2/|
|Pep1000||A database of possible linear polypeptides that are constructed with 20 kinds of amino acids and having molecular weights smaller than 1000.||/mfsearcher/pep1000/|
|KEGG||Re-calculated compound data from KEGG. Weekly updated.||/mfsearcher/kegg/|
|KNApSAcK||Re-calculated compound data from KNApSAcK.||/mfsearcher/knapsack/|
|Flavonoid Viewer||Re-calculated compound data from Flavonoid Viewer .||/mfsearcher/flavonoidviewer/|
|LipidMAPS||Re-calculated compound data from LIPID MAPS.||/mfsearcher/lipidmaps/|
|HMDB||Re-calculated compound data from Human Metabolome Database (HMDB) Version 3.6.||/mfsearcher/hmdb/|
|PubChem||Re-calculated compound data from PubChem. Monthly updated.||/mfsearcher/pubchem/|
|N2D||Neutralized and 2-Dimensional compound database (N2D) has features as follows.
- The compounds registered as charged molecules (ex, [M]+) in the databases can be correctly searched with a normal adduct setting (ex, [M+H]+).
- Redundancy between the databases were removed and the compounds having the same atom-connectivity is compiled in one candidate in the results.
Therefore, the N2D search excludes miss hits (false positives) to salts and charged molecules, and obtaines more concise and interpretable results.
Searching against the databases is executed by describing the search methods and parameters listed below following to the base URL of the target database, and accessing to the URL by an Internet browser or any HTTP methods. All databases accept the same methods. The results will be returned in a text.
|/range||This method searches the molecular formulae by two given mass values, lower and upper limit masses. The molecular formulae having formula weights of larger than or equal to the lower mass and smaller than or equal to the upper mass would be returned. If the given lower mass value was larger than the upper mass value, they will be swapped before the searching.|
|lowerMs||yes||The lower limit of mass value||a real number||0.0|
|upperMs||yes||The upper limit of mass value||a real number||0.0|
|/mass||This method searches the molecular formulae by a target mass value and a margin. A lower limit mass value is calculated as the target mass - margin, and a upper limit mass value is as the target mass + margin. Then the molecular formulae that have molecular weight between the lower and upper masses would be returned as same as the range method. The unit of the margin could be defined by the marginUnit parameter.|
|targetMs||yes||The target mass (the center of the search mass range).||a real number||0.0|
|margin||yes||The mass margin. The molecular formulae whose molecular weight x fulfill the following condition will be returned.
targetMs - margin <= x <= targetMs + margin
|a real number||0.0|
|marginUnit||no||The unit of the margin. When "ppm" is set, a recalculated margin y is used instead of the margin above.
y = margin * targetMs * 0.000001
"ms" or "ppm"
|/version||This method returns the version information (updated date) of the database. The date of the compound data construction for the databases in MFSearcher is shown as "update-date". Please note that "update-date" doesn't mean the versions of the original databases, KEGG, PubChem, and KNApSAcK.|
|/||This shows help of the database. A detailed description of the database, notes for the result values, the default value of the "limit" parameters, updating schedules, and other information are available.|
The followings are optional search parameters commonly acceptable for several methods.
|limit||no||It defines an upper limit number of the search results returned from the MFSearcher. Even if the number of the records that matched to the queried conditions exceeds the limit number, MFSearcher only returns former records in the limit number, by which users can avoid an unexpectedly longer waiting time. When the hit records exceeds the limit number, the "is-limited" value will be set to "true" and it is shown in the result text when the "txth" or "xml" was selected as the "output" parameter (see below).
||a positive integer||set in each database (see the help of the database)|
|output||no||This parameter defines the type of output format
(see Output Format section for details).
txt: a tab delimited text format which carries only the result records.
txth: in addition to the "txt"-results, information of the query and the results are attached as a header.
xml: a text in an xml format having the same information as txth.
When the output parameter is used for the "version" method, the same results will be returned for "txt" and "txth".
The "db" option is used in N2D.
|db||yes||The signature made by two letter alphabets for the databases.
KG: KEGG, KN: KNApSAcK, FL: FlavonoidViewer, HM: HMDB, LM: LipidMAPS, UN: UNPD, PC: PubChem
|KG, KN, FL, HM, LM, UN, PC
Multiple databases can be defined by concatenating the signature with comma. Do not include speces.
Following method and parameters are used in N2D search.
|/search||This method is for searching compounds in N2D database by formula, InChIKey (seleketon) or compound ID.|
|inchikey||no||InChIKey of the first block of InChIKey. Only the first 14 letters of the input is used for search.||text||WHUUTDBJXJRKMK|
|id||no||A compound id described by as follows: A database signature (2 letter alphabets) and compound ID of the database are concatenated by a colon.||text||KN:C00001358|
Search results are returned from the server as a text. MFSearcher provides the following three text formats, and users can select one of them by "output" parameter.
|The value for the output parameter||Description|
|A tab delimited text format. The database name, molecular formula, DBE (double bond equivalent), formula weight, ID in the original database, and name or description of the formula are described for each hit record in a row being separated by tabs.|
KEGG C23H24O9N4 0.0 500.1543283959 C00927 Isonocardicin A;1-Azetidineacetic acid
|txth||A text format with header information. In addition to the text format above, a header information that carries searching conditions, results and so on is attached.
The header lines start with "#". The XML element names (see below) and their values are described in each row as tab delimited text. The last row of the header shows element names in the "result" tag as a tab delimited text, that are corresponding to each column of the search results following to it.
# database-name KEGG
|xml||A text in XML format. Please refer to the next section for the meaning of the XML elements (tag). The definition of the XML tags in DTD is available at the following URL.
<?xml version="1.0" encoding="UTF-8"?>
The meanings of the XML elements (tags) are as follows. The element names are used in both "xml" and "txth" formats.
|mfsearcher||Shows that the XML document is for Kazusa MFSearcher.||(none)|
|database-info||Information about the databases in MFSearcher. It will be returned when the "version" method is used.||(none)|
|record-num||Record number in the database.||integer|
|update-date||The updated date of the database. It means the date of compound data construction for the databases in MFSearcher. Please note that "update-date" doesn't mean the versions of the original databases, KEGG, PubChem, and KNApSAcK.||YYYY-MM-DD|
|search-info||Information about the search query. It will be returned when "range" or "mass" method is used.||(none)|
|search-date||The date and time of query executed.||YYYY-MM-DD HH:mm:ss|
|database-name||The database name for searching.||text|
|search-mode||The mode of the searching.||"range" or "mass"|
|target-mass||The value for "targetMs" parameter (the center of the search mass range). It will only appear when the "mass" method was executed.||real number|
|margin||The value for "margin" parameter. It will only appear when the "mass" method was executed.||real number|
|margin-unit||The value for "marginUnit" parameter. It will only appear when the "mass" method was executed.||"ppm" or "ms"|
|lower-mass||The lower limit mass value actually used for the searching. Not the given value as "lowerMs" parameter in the "range" method.||real number|
|upper-mass||The upper limit mass value actually used for the searching. Not the given value as "upperMs" parameter in the "range" method.||real number|
|result-limitation||Information on the limitation of result numbers.||(none)|
|set-value||The set value of the limitation.||integer|
|is-limited||When the number of hit records exceeds the set value, it is set to "true". Otherwise, it is "false".||"true" or "false"|
|result-record-number||The result number actually returned.||integer|
|search-results||Information of the search results.||(none)|
|result||Information of a record in the results.||(none)|
|molecular-formula||Molecular formula (elemental composition)||text|
|dbe||DBE (double bond equivalent). The value is calculated only for the ExactMassDB. In the other databases, the value is always set to 0.||real number|
|formula-weight||Formula weight (molecular weight). The exact mass values are calculated based on the atom weight list which is available here
|id||ID in the original compound databases. It is always blank for ExactMassDB and Pep1000 databases.
In the results of N2D search, a set information of database name and compound id is described as "Signature:Compound ID". When multiple compounds of the same connectivity were there, the sets were concatenated by comma.
Square brackets "[ ]" with some letters can be inserted in front of the compound id. The letters inside show the state of the compound registered in the original database as follows: Number, The charge of the molecule; f, Multiple components were registered in the record (ex, salts); r, The compound was a radical.
|description||Name or description of the formula. It is always blank for ExactMassDB.
In the results of N2D search, The shortest description among the searched databases is shown as a representative.
|inchikey-skeleton||The first block (14 letters) of InChIKey. This item is only given to the results from N2D search.||text|
Here we show an example for utilization of MFSearcher from a programming language.
my $url = "http://webs2.kazusa.or.jp/mfsearcher/exmassdb/range?lowerMs=500&upperMs=500.001&output=txth";
my $res = get($url);
A Java GUI tool for search with MFSearcher is available.
|zip file||MFSearcher_1.4.2.zip (842 KB)|
|Manual||MFSearcher_manual_en.pdf (324 KB)|
In the MFSearcher system, the compound data provided by KEGG, Flavonoid Viewer, LIPID MAPS, HMDB, PubChem and UNPD were downloaded for academic purposes. The compound data of KNApSAcK is provided by Prof. Kanaya in Nara Institute of Science and Technology (NAIST). The part of these data are utilized to construct the specified databases for rapid mass searching in the MFSearcher system after re-calculating the molecular weights. Please preserve the contracts of each original databases when utilizing the search results against these databases by MFSearcher.
The searching system of MFSearcher, the ExactMassDB database, and the Pep1000 database by Kazusa DNA Research Institute is licensed under aCreative Commons Attribution-NonCommercial 3.0 Unported License.
Sakurai N, Ara T, Kanaya S, Nakamura Y, Iijima Y, Enomoto M, Motegi T, Aoki K, Suzuki H and Shibata D (2012) Application of a relationl database system for high-throughput prediction of elemental compositions from accurate mass values. Bioinformatics 29 (2): 290-291
Shinbo Y, Nakamura Y, Altaf-Ul-Amin M, Asahi H, Kurokawa K, Arita M, Saito K, Ohta D, Shibata D and Kanaya S (2006). KNApSAcK: A comprehensive species-metabolite relationship database. Biotechnology in Agriculture and Forestry. K. Saito, R. A. Dixon and L. Willmitzer. Berlin Heidelberg, Springer-Verlag. 57: 165-181.
The Seven Golden Rules
Kind T and Fiehn O (2007) Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry, BMC Bioinformatics, 8, 105.
A part of the development of MFSearcher was supported by New Energy and Industrial Technology Development Organization (NEDO, Japan) as part of a project entitled the 'Development of Fundamental Technologies for Controlling the Material Production Process of Plants' [P02001]. A part of this work was also supported by Japan Science and Technology Agency (JST, Japan), as part of the project entitled 'Life Science Database Integration Project' of National Bioscience Database Center (NBDC).
|Nozomu Sakurai, Ph.D.||Kazusa DNA Research Institute||design, development|
|Shigehiko Kanaya, Ph.D.||Nara Institute of Science and Technology||cooperation to KNApSAcK data|
|Hideyuki Suzuki, Ph.D.||Kazusa DNA Research Institute|
|Daisuke Shibata, Ph.D.||Kazusa DNA Research Institute||the director|
This web site is administrated by Kazusa DNA Research Institute. Please send your all inquirys to the following e-mail address.
Nozomu Sakurai: sakurai AT kazusa.or.jp (replace "AT" to "@")
|KOMICS||Metabolomics activities in our laboratory.|