Following the pit of synchronization problems encountered before ( storm pit - synchronization problem ), the code has recently been adjusted and refactored, and another pit that should be vigilant in storm development has been encountered. Next, let's talk about the general situation of this pit.
In my Storm program, Abolt needs to encapsulate the data into an object and send it to Bbolt and Cbolt at the same time. After Bbolt and Cbolt respectively do certain processing on the object, update it to the database. When looking at the log, I accidentally found that some data was incorrect and weird. I suspected the algorithm problem at first, but found that some data were correct. The algorithm should be fine. After the entanglement, a more detailed log was printed. After observing the law of the strange data, I finally realized that it must be that the modification of the object after Bbolt received the object affected Cbolt. Here the author is almost certain: when Bbolt and Cbolt are running in the same process. The objects sent to Bbolt and Cbolt are public. Modifications to Bbolt affect Cbolt and vice versa. This has no effect if Bbolt and Cbolt are not the same process. This explains why some data are normal and some are abnormal.
Here is an example code to test:
Topology building class:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
public
class
Main {
public
static
void
main(String[] args) {
TopologyBuilder builder =
new
TopologyBuilder();
builder.setSpout(
"test"
,
new
TestWordSpout());
builder.setBolt(
"print1"
,
new
PrintBolt(
"PrintBolt1"
)).shuffleGrouping(
"test"
);
builder.setBolt(
"print2"
,
new
PrintBolt(
"PrintBolt2"
)).shuffleGrouping(
"test"
);
Config conf =
new
Config();
conf.setDebug(
false
);
conf.setNumWorkers(
1
);
LocalCluster cluster =
new
LocalCluster();
cluster.submitTopology(
"test-kafka-1"
, conf, builder.createTopology());
}
}
|
spout class:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
public
class
TestWordSpout
extends
BaseRichSpout {
private
static
final
long
serialVersionUID = 1L;
SpoutOutputCollector _collector;
public
void
open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
}
public
void
close() {
}
public
void
nextTuple() {
Utils.sleep(
1000
);
Name name =
new
Name();
name.setName(
"123"
);
_collector.emit(
new
Values(name));
}
public
void
declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(
new
Fields(
"word"
));
}
}
|
bolt class:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
public
class
PrintBolt
extends
BaseRichBolt {
private
static
final
long
serialVersionUID = 1L;
private
String name;
int
taskid;
public
PrintBolt(String name){
this
.name = name;
}
@Override
public
void
prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this
.taskid = context.getThisTaskId();
}
@Override
public
void
execute(Tuple input) {
Name name = (Name) input.getValueByField(
"word"
);
System.out.println(logPrefix()+name.getName());
name.setName(
this
.name);
}
private
String logPrefix(){
return
this
.name+
":"
;
}
@Override
public
void
declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
|
Possible execution results:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
PrintBolt2:
123
PrintBolt1:
123
PrintBolt2:
123
PrintBolt1:
123
PrintBolt2:
123
PrintBolt1:
123
PrintBolt2:PrintBolt1
PrintBolt2:
123
PrintBolt1:
123
PrintBolt1:
123
PrintBolt2:
123
PrintBolt1:
123
PrintBolt2:
123
|
As you can see from the above results, PrintBolt2 prints the modifications of PrintBolt1.
Knowing this situation, you have to take this kind of accident into account when writing code in the future. If an object will be sent to two bolts for processing at the same time, all bolts must modify the object. Before modifying, be sure to clone a copy instead of modifying it directly!