[ English | Japanese ]

Welcome to MFSearcher !

Kazusa Molecular Formula Searcher (MFSearcher) is a RESTful web service for...

  • high-throughput prediction of elemental compositions from accurate mass values detected by high-resolution mass spectrometers.
  • rapid search to the major compound databases, KEGG, PubChem, KNApSAcK, LIPID MAPS, Flavonoid Viewer and HMDB.
  • removing redundancy of the same compounds among the databases by UC2 search.

A Java based GUI tool to search with MFSearcher is available.

Publications:

Sakurai N, Narise T, Sim JS, Lee CM, Ikeda C, Akimoto N and Kanaya S (2018) UC2 search: using unique connectivity of uncharged compounds for metabolite annotation by database searching in mass spectrometry-based metabolomics. Bioinformatics 34: 698-700
[PMID: 29040459]

Sakurai N, Ara T, Kanaya S, Nakamura Y, Iijima Y, Enomoto M, Motegi T, Aoki K, Suzuki H and Shibata D (2013) Application of a relational database system for high-throughput prediction of elemental compositions from accurate mass values. Bioinformatics 29 (2): 290-291
[PMID: 23162084]


URL Change Notice and Request

The hostname in the URL of the MFSearcher service, "webs2.kazusa.or.jp," has been changed to "webs2.kazusa-db.jp". Please update your URL in a link or bookmark, especially if it is used for API accesses. The previous hostname, "webs2.kazusa.or.jp," will expire completely in a certain period of time. See the details on this page.

News

2022/4/4
Data for KNApSAcK were updated.
2022/3/17
Data for HMDB and LIPID MAPS were updated.
2020/2/6
The link URL to KNApSAcK was updated in the MFSearcher Java tool.

Details and previous news...

Basic Use

Execution of Search

All the query for MFSearcher is executed as URL access to the MFSearcher server.

Ex.) Click on the following link, or copy & paste the URL into the address field on your browser and press return key to execute the query.

http:// /mfsearcher/exmassdb/range?lowerMs=500&upperMs=500.01

As shown here, the URL is constructed by query information, namely, the target database to search, the searching method, the parameters and their values, and by the symbols connecting them. The results will be returned as text data as follows.

ExactMassDB C16H11O2N10P3S1 21.0 500.0000046695
ExactMassDB C32H1N6P1 37.0 500.0000320763
ExactMassDB C6H37O5P9S1 2.0 500.0000372918
ExactMassDB C19H17O10P1S2 13.0 500.0000767653
   :
   :

Helps

Help pages are appeared when the URL are ended with slash ("/") for the top page (this page) and for each database.

/mfsearcher/
/mfsearcher/exmassdb/

Reference to the Atomic Weight Table

The list of the atomic weights used for the calculation of accurate mass values in the MFSearcher is available from the URL below.

/mfsearcher/atomlist

The list is originated from the following paper.

De Laeter JR, Bohlke JK, De Bievre P, Hidaka H, Peiser HS, Rosman KJR, Taylor PDP (2003) Atomic weights of the elements: Review 2000 (IUPAC technical report). Pure Appl Chem 75: 683-800

All the mass values in the MFSearcher database are calculated with the list, therefore, the values could be different from the original data in KEGG, PubChem, and so on.

Databases

The databases available in MFSearcher are listed below. A query against a database should be written down following to the base URL of the database. Detailed descriptions of the databases are available by just accessing to the base URLs.

Database Description Base URL
ExactMassDB A database of possible elemental compositions consits of C: 100, H: 200, O: 50, N: 10, P: 10, and S: 10, that satisfy the Senior and the Lewis valence rules. /mfsearcher/exmassdb/
ExactMassDB-HR2 HR2, which is one of the fastest tools for calculation of elemental compositions, filters some elemental compositions according to the Seven Golden Rules (Kind and Fiehn, 2007). The ExactMassDB-HR2 database returns the same result as does HR2 with the same atom kind and number condition as that used in construction of the ExactMassDB. /mfsearcher/exmassdb-hr2/
Pep1000 A database of possible linear polypeptides that are constructed with 20 kinds of amino acids and having molecular weights smaller than 1000. /mfsearcher/pep1000/
KEGG Re-calculated compound data from KEGG. Weekly updated. /mfsearcher/kegg/
KNApSAcK Re-calculated compound data from KNApSAcK. /mfsearcher/knapsack/
Flavonoid Viewer Re-calculated compound data from Flavonoid Viewer . /mfsearcher/flavonoidviewer/
LipidMAPS Re-calculated compound data from LIPID MAPS. /mfsearcher/lipidmaps/
HMDB Re-calculated compound data from Human Metabolome Database (HMDB) Version 3.6. /mfsearcher/hmdb/
PubChem Re-calculated compound data from PubChem. Monthly updated. /mfsearcher/pubchem/
UC2 Unique Connectivity of Uncharged Compound database (UC2) has features as follows.
- The compounds registered as charged molecules (ex, [M]+) in the databases can be correctly searched with a normal adduct setting (ex, [M+H]+).
- Redundancy between the databases were removed and the compounds having the same atom-connectivity is compiled in one candidate in the results.
Therefore, the UC2 search excludes miss hits (false positives) to salts and charged molecules, and obtaines more concise and interpretable results.
/mfsearcher/uc2/

Search Methods and Parameters

Searching against the databases is executed by describing the search methods and parameters listed below following to the base URL of the target database, and accessing to the URL by an Internet browser or any HTTP methods. All databases accept the same methods. The results will be returned in a text.

Method Description
/range This method searches the molecular formulae by two given mass values, lower and upper limit masses. The molecular formulae having formula weights of larger than or equal to the lower mass and smaller than or equal to the upper mass would be returned. If the given lower mass value was larger than the upper mass value, they will be swapped before the searching.
Parameter required Description Value Default
lowerMs yes The lower limit of mass value a real number 0.0
upperMs yes The upper limit of mass value a real number 0.0

/mfsearcher/exmassdb/range?lowerMs=500&upperMs=500.01
/mfsearcher/kegg/range?lowerMs=500&upperMs=500.1

/mass This method searches the molecular formulae by a target mass value and a margin. A lower limit mass value is calculated as the target mass - margin, and a upper limit mass value is as the target mass + margin. Then the molecular formulae that have molecular weight between the lower and upper masses would be returned as same as the range method. The unit of the margin could be defined by the marginUnit parameter.
Parameter required Description Value Default
targetMs yes The target mass (the center of the search mass range). a real number 0.0
margin yes The mass margin. The molecular formulae whose molecular weight x fulfill the following condition will be returned.
targetMs - margin <= x <= targetMs + margin
a real number 0.0
marginUnit no The unit of the margin. When "ppm" is set, a recalculated margin y is used instead of the margin above.
y = margin * targetMs * 0.000001
text

"ms" or "ppm"
ms

/mfsearcher/exmassdb/mass?targetMs=500&margin=0.01
/mfsearcher/pubchem/mass?targetMs=500&margin=1&marginUnit=ppm

/version This method returns the version information (updated date) of the database. The date of the compound data construction for the databases in MFSearcher is shown as "update-date". Please note that "update-date" doesn't mean the versions of the original databases, KEGG, PubChem, and KNApSAcK.

/mfsearcher/kegg/version
/mfsearcher/pubchem/version

/ This shows help of the database. A detailed description of the database, notes for the result values, the default value of the "limit" parameters, updating schedules, and other information are available.

/mfsearcher/
/mfsearcher/exmassdb/
/mfsearcher/kegg/

Common Parameters

The followings are optional search parameters commonly acceptable for several methods.

Common to
/range and
/mass
Parameter required Description Value Default
limit no It defines an upper limit number of the search results returned from the MFSearcher. Even if the number of the records that matched to the queried conditions exceeds the limit number, MFSearcher only returns former records in the limit number, by which users can avoid an unexpectedly longer waiting time. When the hit records exceeds the limit number, the "is-limited" value will be set to "true" and it is shown in the result text when the "txth" or "xml" was selected as the "output" parameter (see below).
a positive integer set in each database (see the help of the database)

/mfsearcher/exmassdb/range?lowerMs=500&upperMs=500.01&limit=20
/mfsearcher/pubchem/mass?targetMs=500&margin=1&limit=50

Common to
/range,
/mass,
and
/version
Parameter required Description Value Default
output no This parameter defines the type of output format
(see Output Format section for details).

txt: a tab delimited text format which carries only the result records.

txth: in addition to the "txt"-results, information of the query and the results are attached as a header.

xml: a text in an xml format having the same information as txth.

When the output parameter is used for the "version" method, the same results will be returned for "txt" and "txth".
text

"txt",
"txth", or
"xml"
txt

/mfsearcher/exmassdb/range?lowerMs=500&upperMs=500.01&output=txth
/mfsearcher/kegg/mass?targetMs=500&margin=0.1&output=xml
/mfsearcher/pep1000/version&output=xml

UC2 Specific Parameters

The "db" option is used in UC2.

Parameter required Description Value
db yes The signature made by two letter alphabets for the databases.
KG: KEGG, KN: KNApSAcK, FL: FlavonoidViewer, HM: HMDB, LM: LipidMAPS, UN: UNPD, PC: PubChem
KG, KN, FL, HM, LM, UN, PC
Multiple databases can be defined by concatenating the signature with comma. Do not include speces.

/mfsearcher/uc2/range?lowerMs=286.04&upperMs=286.05&db=FL
/mfsearcher/uc2/range?lowerMs=286.04&upperMs=286.05&db=FL,KG,KN,HM,LM

Following method and parameters are used in UC2 search.

Method Description
/search This method is for searching compounds in UC2 database by formula, InChIKey (seleketon) or compound ID.
Parameter Required Description Value Example
formula no Formula text C5H9NO4

/mfsearcher/uc2/search?formula=C5H9NO4&db=KG,KN,HM

Parameter Required Description Value Example
inchikey no InChIKey of the first block of InChIKey. Only the first 14 letters of the input is used for search. text WHUUTDBJXJRKMK

/mfsearcher/uc2/search?inchikey=WHUUTDBJXJRKMK&db=KG,KN,HM

Parameter Required Description Value Example
id no A compound id described by as follows: A database signature (2 letter alphabets) and compound ID of the database are concatenated by a colon. text KN:C00001358

/mfsearcher/uc2/search?id=KN:C00001358&db=KG,KN,HM

Output Format

Search results are returned from the server as a text. MFSearcher provides the following three text formats, and users can select one of them by "output" parameter.

The value for the output parameter Description
txt
(default)
A tab delimited text format. The database name, molecular formula, DBE (double bond equivalent), formula weight, ID in the original database, and name or description of the formula are described for each hit record in a row being separated by tabs.
Ex.) /mfsearcher/kegg/range?lowerMs=500&upperMs=501&output=txt

KEGG C23H24O9N4 0.0 500.1543283959 C00927 Isonocardicin A;1-Azetidineacetic acid
KEGG C23H24O9N4 0.0 500.1543283959 C01941 Nocardicin A
KEGG C23H24O9N4 0.0 500.1543283959 C17350 Nocardicin B
  :

txth A text format with header information. In addition to the text format above, a header information that carries searching conditions, results and so on is attached.
The header lines start with "#". The XML element names (see below) and their values are described in each row as tab delimited text. The last row of the header shows element names in the "result" tag as a tab delimited text, that are corresponding to each column of the search results following to it.
Ex.) /mfsearcher/kegg/range?lowerMs=500&upperMs=501&output=txth

# database-name KEGG
# search-date 2010-12-17 15:34:34
# search-mode range
# lower-mass 500.0
# upper-mass 501.0
# result-limitation_set-value 100
# result-limitation_is-limited false
# result-record-number 11
# db-name molecular-formula dbe formula-weight id description
KEGG C23H24O9N4 0.0 500.1543283959 C00927 Isonocardicin A;1-Azetidineacetic acid
KEGG C23H24O9N4 0.0 500.1543283959 C01941 Nocardicin A
KEGG C23H24O9N4 0.0 500.1543283959 C17350 Nocardicin B
  :

xml A text in XML format. Please refer to the next section for the meaning of the XML elements (tag). The definition of the XML tags in DTD is available at the following URL.
http:// /mfsearcher/mfsearcher.dtd
Ex.) /mfsearcher/kegg/range?lowerMs=500&upperMs=501&output=xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mfsearcher SYSTEM
  "https://webs2.kazusa-db.jp/mfsearcher/mfsearcher.dtd">
<mfsearcher>

  <search-info>

    <search-date>2010-12-17 15:35:22
    </search-date>

    <database-name>KEGG
    </database-name>

    <search-mode>range
    </search-mode>

    <lower-mass>500.0
    </lower-mass>
  :

XML Elements (tags)

The meanings of the XML elements (tags) are as follows. The element names are used in both "xml" and "txth" formats.

Element name Description Value
mfsearcher Shows that the XML document is for Kazusa MFSearcher. (none)
database-info Information about the databases in MFSearcher. It will be returned when the "version" method is used. (none)
name Database name. text
record-num Record number in the database. integer
update-date The updated date of the database. It means the date of compound data construction for the databases in MFSearcher. Please note that "update-date" doesn't mean the versions of the original databases, KEGG, PubChem, and KNApSAcK. YYYY-MM-DD
search-info Information about the search query. It will be returned when "range" or "mass" method is used. (none)
search-date The date and time of query executed. YYYY-MM-DD HH:mm:ss
database-name The database name for searching. text
search-mode The mode of the searching. "range" or "mass"
target-mass The value for "targetMs" parameter (the center of the search mass range). It will only appear when the "mass" method was executed. real number
margin The value for "margin" parameter. It will only appear when the "mass" method was executed. real number
margin-unit The value for "marginUnit" parameter. It will only appear when the "mass" method was executed. "ppm" or "ms"
lower-mass The lower limit mass value actually used for the searching. Not the given value as "lowerMs" parameter in the "range" method. real number
upper-mass The upper limit mass value actually used for the searching. Not the given value as "upperMs" parameter in the "range" method. real number
result-limitation Information on the limitation of result numbers. (none)
set-value The set value of the limitation. integer
is-limited When the number of hit records exceeds the set value, it is set to "true". Otherwise, it is "false". "true" or "false"
result-record-number The result number actually returned. integer
search-results Information of the search results. (none)
result Information of a record in the results. (none)
db-name Database name. text
molecular-formula Molecular formula (elemental composition) text
dbe DBE (double bond equivalent). The value is calculated only for the ExactMassDB. In the other databases, the value is always set to 0. real number
formula-weight Formula weight (molecular weight). The exact mass values are calculated based on the atom weight list which is available here
/mfsearcher/atomlist
real number
id ID in the original compound databases. It is always blank for ExactMassDB and Pep1000 databases.
In the results of UC2 search, a set information of database name and compound id is described as "Signature:Compound ID". When multiple compounds of the same connectivity were there, the sets were concatenated by comma.
Square brackets "[ ]" with some letters can be inserted in front of the compound id. The letters inside show the state of the compound registered in the original database as follows: Number, The charge of the molecule; f, Multiple components were registered in the record (ex, salts); r, The compound was a radical.
text
description Name or description of the formula. It is always blank for ExactMassDB.
In the results of UC2 search, The shortest description among the searched databases is shown as a representative.
text
inchikey-skeleton The first block (14 letters) of InChIKey. This item is only given to the results from UC2 search. text

Sample Program

Here we show an example for utilization of MFSearcher from a programming language.

Perl

use LWP::Simple;

my $url = "https://webs2.kazusa-db.jp/mfsearcher/exmassdb/range?lowerMs=500&upperMs=500.001&output=txth";
my $res = get($url);
print $res;

Download MFSearcher GUI Tool

A Java GUI tool for search with MFSearcher is available.

zip file MFSearcher_1.6.1.zip (1.41 MB, 2019/8/28 updated)
Manual MFSearcher_manual_en.pdf (367 KB, English)
MFSearcher_manual_ja.pdf (511 KB, Japanese)
MFSearcher_manual_ko.pdf for v.1.5.6 (468 KB, Korean)

Licensing Information

In the MFSearcher system, the compound data provided by KEGG, Flavonoid Viewer, LIPID MAPS, HMDB, PubChem and UNPD were downloaded for academic purposes. The compound data of KNApSAcK is provided by Prof. Kanaya in Nara Institute of Science and Technology (NAIST). The part of these data are utilized to construct the specified databases for rapid mass searching in the MFSearcher system after re-calculating the molecular weights. Please preserve the contracts of each original databases when utilizing the search results against these databases by MFSearcher.

Creative Commons License
The searching system of MFSearcher, the ExactMassDB database, and the Pep1000 database by Kazusa DNA Research Institute is licensed under a€Creative Commons Attribution-NonCommercial 3.0 Unported License.

References

MFSearcher
Sakurai N, Narise T, Sim JS, Lee CM, Ikeda C, Akimoto N and Kanaya S (2017) UC2 search: using unique connectivity of uncharged compounds for metabolite annotation by database. Bioinformatics 34 (4): 698-700
[PMID: 29040459]

Sakurai N, Ara T, Kanaya S, Nakamura Y, Iijima Y, Enomoto M, Motegi T, Aoki K, Suzuki H and Shibata D (2012) Application of a relationl database system for high-throughput prediction of elemental compositions from accurate mass values. Bioinformatics 29 (2): 290-291
[PMID: 23162084]

KNApSAcK
Shinbo Y, Nakamura Y, Altaf-Ul-Amin M, Asahi H, Kurokawa K, Arita M, Saito K, Ohta D, Shibata D and Kanaya S (2006). KNApSAcK: A comprehensive species-metabolite relationship database. Biotechnology in Agriculture and Forestry. K. Saito, R. A. Dixon and L. Willmitzer. Berlin Heidelberg, Springer-Verlag. 57: 165-181.

The Seven Golden Rules
Kind T and Fiehn O (2007) Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry, BMC Bioinformatics, 8, 105.

Acknowledgements

The compound data are kindly provided by the following databases.
KEGG, KNApSAcK, Flavonoid Viewer, LIPID MAPS, HMDB, PubChem and UNPD.

A part of the development of MFSearcher was supported by New Energy and Industrial Technology Development Organization (NEDO, Japan) as part of a project entitled the 'Development of Fundamental Technologies for Controlling the Material Production Process of Plants' [P02001]. A part of this work was also supported by Japan Science and Technology Agency (JST, Japan), as part of the project entitled 'Life Science Database Integration Project' of National Bioscience Database Center (NBDC).

About Us

Developing Team

Nozomu Sakurai, Ph.D. Kazusa DNA Research Institute design, development
Shigehiko Kanaya, Ph.D. Nara Institute of Science and Technology cooperation to KNApSAcK data
Hideyuki Suzuki, Ph.D. Kazusa DNA Research Institute  
Daisuke Shibata, Ph.D. Kazusa DNA Research Institute the director

Contact Us

This web site is administrated by Sakura Scientific Co. Ltd. Please send your all inquirys to the following e-mail address.

Nozomu Sakurai: ns AT sakura-kagaku.com (replace "AT" to "@")