Hardware Overview : cpu: 24, memory: 20g, Disk: 10 * 2.7T.
Write performance :( not ip add geo information).
Write performance comparison |
speed |
Commit time-consuming (s) 500 * 1000 |
Bulk consuming (s) 1000 dns |
Bulk consuming (s) 1000 tcpflow |
Bulk consuming (s) 1000 weblog |
Cpu occupancy |
Disk Usage |
The amount of data tcpflow |
Thread configuration |
tantivy |
155272 |
6-19 |
0.01-0.06 |
0.1-0.2 |
0.1-0.2 |
40-80 us, 5-15 |
20-90 |
4_000_000 bar, 870M |
10*2+10*2*3 |
lucene |
151633 |
3-4 |
0.2-0.3 |
1.3-1.4 |
1.3-1.4 |
60-80 us, 5sy |
20-90 |
4_500_000 bar, 1.3G |
10*5 |
Features:
Query: query.
Query |
TermQuery |
BooleanQuery |
WildcardQuery |
PhraseQuery |
RangeQuery |
FuzzyQuery |
RegexpQuery |
ConstantScoreQuery |
PrefixQuery |
tantivy |
Y |
Y |
Y |
Y |
Y |
Y |
Y |
Y |
N |
lucene |
Y |
Y |
Y |
Y |
Y |
Y |
Y |
Y |
Y |
Collector : used to obtain information in the query field of doc, used to sort, filter, and aggregation.
Collector |
TopCollector |
TimeLimitingCollector |
CountCollector |
tantivy |
Y |
N |
Y |
lucene |
Y |
Y |
N |
Docvalues / fastfield : obtaining field information by doc docvalues, used to sort, filter, and aggregation.
|
Docvalues/fastfield |
tantivy |
fastfield (currently only supports digital) |
lucene |
Docvalues |
IndexWriter: write data.
IndexWriter |
Flush (without fsync, data may be in the buffer) |
Commit (fsync to disk) |
tantivy |
N (not currently found) |
Y |
lucene |
Y |
Y |
to sum up:
Features
tantivy has implemented most of the features of lucene. Specific differences in the table above.
Write performance
Overall write performance similar.
When bulk index data, tantivy faster than lucene.
When executed commit, tantivy better to lucene, see write performance.
Disk Usage
Disk Usage or less, as described in write performance.