客户端调用ignite报 Cluster group is empty

Exception in thread "main" class org.apache.ignite.IgniteException: Failed to get cache affinity (cache was not started yet or cache was already stopped): logic_info
    at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1025)
    at org.apache.ignite.internal.IgniteComputeImpl.affinityCallAsync0(IgniteComputeImpl.java:344)
    at org.apache.ignite.internal.IgniteComputeImpl.affinityCall(IgniteComputeImpl.java:302)
    at com.rayfay.ignite.test.compute.CallTest.main(CallTest.java:28)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to get cache affinity (cache was not started yet or cache was already stopped): logic_info
    at org.apache.ignite.internal.processors.affinity.GridAffinityProcessor.partition0(GridAffinityProcessor.java:198)
    at org.apache.ignite.internal.processors.affinity.GridAffinityProcessor.partition(GridAffinityProcessor.java:181)
    at org.apache.ignite.internal.processors.affinity.GridAffinityProcessor.partition(GridAffinityProcessor.java:161)
    at org.apache.ignite.internal.IgniteComputeImpl.affinityCallAsync0(IgniteComputeImpl.java:335)
    ... 2 more

调用代码 

Integer rs = ignite.compute().affinityCall("logic_info",key,new Function(key));

这个报错比较奇怪,在本地启动的时候,报错一次,第二次再调用又好了

改成服务器环境调用第一次,也是失败了,

接着调用后,出这样的错

Exception in thread "main" class org.apache.ignite.cluster.ClusterGroupEmptyException: Cluster group is empty.
    at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:882)
    at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:880)
    at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1020)
    at org.apache.ignite.internal.IgniteComputeImpl.affinityCall(IgniteComputeImpl.java:305)
    at com.rayfay.ignite.test.compute.CallTest.main(CallTest.java:28)
Caused by: class org.apache.ignite.internal.cluster.ClusterGroupEmptyCheckedException: Cluster group is empty.
    at org.apache.ignite.internal.util.IgniteUtils.emptyTopologyException(IgniteUtils.java:4860)
    at org.apache.ignite.internal.processors.closure.GridClosureProcessor.affinityCall(GridClosureProcessor.java:510)
    at org.apache.ignite.internal.IgniteComputeImpl.affinityCallAsync0(IgniteComputeImpl.java:341)
    at org.apache.ignite.internal.IgniteComputeImpl.affinityCall(IgniteComputeImpl.java:302)
    ... 1 more

归根结底是调用 方法 affinityCall 出的错,下面开始debug  affinityCall

final Object affKey0 = ctx.affinity().affinityKey(cacheName, affKey);

int partId = ctx.affinity().partition(cacheName, affKey0);

这两行代码确定partId的可以通过

在类 GridClosureProcessor调用下面代码的时候,拿到的node是空的

final ClusterNode node = ctx.affinity().mapPartitionToNode(cacheName, partId, mapTopVer);

这个node拿不到值后,直接返回报错了

if (node == null)
    return ComputeTaskInternalFuture.finishedFuture(ctx, T5.class, U.emptyTopologyException());

下面重点分析 为什么根据partId没有映射到node 

通过日志 能够看到客户端启动的时候 TcpDiscoveryZookeeperIpFinder 查找的地址有本地的地址

[2019-04-16T15:17:04,002][INFO ][tcp-client-disco-msg-worker-#4%ignite-baodao%][TcpDiscoveryZookeeperIpFinder] ZooKeeper IP Finder resolved addresses: [/192.168.106.101:47500, /192.168.106.100:47500, /192.168.106.103:47500, /192.168.106.102:47500, /127.0.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500]

关掉本地节点,然后稳定的出现Cluster group is empty



Exception in thread "main" class org.apache.ignite.cluster.ClusterGroupEmptyException: Cluster group is empty.
	at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:882)
	at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:880)
	at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1020)
	at org.apache.ignite.internal.IgniteComputeImpl.affinityCall(IgniteComputeImpl.java:305)
	at com.rayfay.ignite.test.compute.CallTest.main(CallTest.java:28)
Caused by: class org.apache.ignite.internal.cluster.ClusterGroupEmptyCheckedException: Cluster group is empty.
	at org.apache.ignite.internal.util.IgniteUtils.emptyTopologyException(IgniteUtils.java:4860)
	at org.apache.ignite.internal.processors.closure.GridClosureProcessor.affinityCall(GridClosureProcessor.java:510)
	at org.apache.ignite.internal.IgniteComputeImpl.affinityCallAsync0(IgniteComputeImpl.java:341)
	at org.apache.ignite.internal.IgniteComputeImpl.affinityCall(IgniteComputeImpl.java:302)
	... 1 more

下面把精力投放到对这个错误的分析

主要原因就是这段代码,它的返回值是空

final ClusterNode node = ctx.affinity().mapPartitionToNode(cacheName, partId, mapTopVer);

mapPartitionToNode的方法位置在 类  GridAffinityProcessor 中,这段代码的核心在affinityCache调用

/**
     * Maps partition to a node.
     *
     * @param cacheName Cache name.
     * @param partId partition.
     * @param topVer Affinity topology version.
     * @return Picked node.
     * @throws IgniteCheckedException If failed.
     */
    @Nullable public ClusterNode mapPartitionToNode(String cacheName, int partId, AffinityTopologyVersion topVer)
        throws IgniteCheckedException {
        assert cacheName != null;

        AffinityInfo affInfo = affinityCache(cacheName, topVer);

        return affInfo != null ? F.first(affInfo.assignment().get(partId)) : null;
    }
 /**
     * @param cacheName Cache name.
     * @param topVer Topology version.
     * @return Affinity cache.
     * @throws IgniteCheckedException In case of error.
     */
    @SuppressWarnings("ErrorNotRethrown")
    @Nullable private AffinityInfo affinityCache(final String cacheName, AffinityTopologyVersion topVer)
        throws IgniteCheckedException {

        assert cacheName != null;

        AffinityAssignmentKey key = new AffinityAssignmentKey(cacheName, topVer);

        IgniteInternalFuture<AffinityInfo> fut = affMap.get(key);

        if (fut != null)
            return fut.get();

        GridCacheAdapter<Object, Object> cache = ctx.cache().internalCache(cacheName);

        if (cache != null) {
            GridCacheContext<Object, Object> cctx = cache.context();

            cctx.awaitStarted();

            AffinityAssignment assign0 = cctx.affinity().assignment(topVer);

            try {
                cctx.gate().enter();
            }
            catch (IllegalStateException ignored) {
                return null;
            }

            try {
                GridAffinityAssignment assign = assign0 instanceof GridAffinityAssignment ?
                    (GridAffinityAssignment)assign0 :
                    new GridAffinityAssignment(topVer, assign0.assignment(), assign0.idealAssignment(), assign0.mvccCoordinator());

                AffinityInfo info = new AffinityInfo(
                    cctx.config().getAffinity(),
                    cctx.config().getAffinityMapper(),
                    assign,
                    cctx.cacheObjectContext());

                IgniteInternalFuture<AffinityInfo> old = affMap.putIfAbsent(key, new GridFinishedFuture<>(info));

                if (old != null)
                    info = old.get();

                return info;
            }
            finally {
                cctx.gate().leave();
            }
        }

        Collection<ClusterNode> cacheNodes = ctx.discovery().cacheNodes(cacheName, topVer);

        if (F.isEmpty(cacheNodes))
            return null;

        GridFutureAdapter<AffinityInfo> fut0 = new GridFutureAdapter<>();

        IgniteInternalFuture<AffinityInfo> old = affMap.putIfAbsent(key, fut0);

        if (old != null)
            return old.get();

        int max = ERROR_RETRIES;
        int cnt = 0;

        Iterator<ClusterNode> it = cacheNodes.iterator();

        // We are here because affinity has not been fetched yet, or cache mode is LOCAL.
        while (true) {
            cnt++;

            if (!it.hasNext())
                it = cacheNodes.iterator();

            // Double check since we deal with dynamic view.
            if (!it.hasNext())
                // Exception will be caught in this method.
                throw new IgniteCheckedException("No cache nodes in topology for cache name: " + cacheName);

            ClusterNode n = it.next();

            CacheMode mode = ctx.cache().cacheMode(cacheName);

            if (mode == null) {
                if (ctx.clientDisconnected())
                    throw new IgniteClientDisconnectedCheckedException(ctx.cluster().clientReconnectFuture(),
                            "Failed to get affinity mapping, client disconnected.");

                throw new IgniteCheckedException("No cache nodes in topology for cache name: " + cacheName);
            }

            // Map all keys to a single node, if the cache mode is LOCAL.
            if (mode == LOCAL) {
                fut0.onDone(new IgniteCheckedException("Failed to map keys for LOCAL cache."));

                // Will throw exception.
                fut0.get();
            }

            try {
                // Resolve cache context for remote node.
                // Set affinity function before counting down on latch.
                fut0.onDone(affinityInfoFromNode(cacheName, topVer, n));

                break;
            }
            catch (IgniteCheckedException e) {
                if (log.isDebugEnabled())
                    log.debug("Failed to get affinity from node (will retry) [cache=" + cacheName +
                        ", node=" + U.toShortString(n) + ", msg=" + e.getMessage() + ']');

                if (cnt < max) {
                    U.sleep(ERROR_WAIT);

                    continue;
                }

                affMap.remove(key, fut0);

                fut0.onDone(new IgniteCheckedException("Failed to get affinity mapping from node: " + n, e));

                break;
            }
            catch (RuntimeException | Error e) {
                fut0.onDone(new IgniteCheckedException("Failed to get affinity mapping from node: " + n, e));

                break;
            }
        }

        return fut0.get();
    }
affinityCache 这个方法比较长, 第一次调用的时候,注意下面的那段注释
// Resolve cache context for remote node.
// Set affinity function before counting down on latch.
fut0.onDone(affinityInfoFromNode(cacheName, topVer, n));

设置affinity function的时候,调用方法  affinityInfoFromNode, 而这个方法里面的第一段就是去请求  AffinityJob

GridTuple3<GridAffinityMessage, GridAffinityMessage, GridAffinityAssignment> t = ctx.closure()
    .callAsyncNoFailover(BROADCAST, affinityJob(cacheName, topVer), F.asList(n), true/*system pool*/, 0, false).get();
/**
     * Requests {@link AffinityFunction} and {@link AffinityKeyMapper} from remote node.
     *
     * @param cacheName Name of cache on which affinity is requested.
     * @param topVer Topology version.
     * @param n Node from which affinity is requested.
     * @return Affinity cached function.
     * @throws IgniteCheckedException If either local or remote node cannot get deployment for affinity objects.
     */
    private AffinityInfo affinityInfoFromNode(String cacheName, AffinityTopologyVersion topVer, ClusterNode n)
        throws IgniteCheckedException {
        GridTuple3<GridAffinityMessage, GridAffinityMessage, GridAffinityAssignment> t = ctx.closure()
            .callAsyncNoFailover(BROADCAST, affinityJob(cacheName, topVer), F.asList(n), true/*system pool*/, 0, false).get();

        AffinityFunction f = (AffinityFunction)unmarshall(ctx, n.id(), t.get1());
        AffinityKeyMapper m = (AffinityKeyMapper)unmarshall(ctx, n.id(), t.get2());

        assert m != null;

        // Bring to initial state.
        f.reset();
        m.reset();

        CacheConfiguration ccfg = ctx.cache().cacheConfiguration(cacheName);

        return new AffinityInfo(f, m, t.get3(), ctx.cacheObjects().contextForCache(ccfg));
    }

通过debug发现,这里面返回的三元数组的第三个报了异常,空指针,请接着往这段代码下面看 

这个空指针还算没有什么问题,因为是没办法toString()导致的,debug看里面的值是这样的

Method threw 'java.lang.NullPointerException' exception. Cannot evaluate org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.toString()

这里面显示的assignment的size是9,但是里面的元素只有1个,其它的是null element

这个9让我印象深刻,因为在启动的xml里面配置过 

<property name="affinity">
                        <bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
                            <property name="excludeNeighbors" value="true"/>
                            <property name="partitions" value="9"/>
                        </bean>
                    </property>

而 AffinityJob 内部要做的事情是这样的, 下面就分析一下这个第三个元素到底是啥?

此方式在类  GridAffinityUtils 中 , 通过本地debug,可以发现,这个值获取的是正确的,只是服务器获取的是null

/** {@inheritDoc} */
        @Override public GridTuple3<GridAffinityMessage, GridAffinityMessage, GridAffinityAssignment> call()
            throws Exception {
            assert ignite != null;
            assert log != null;

            IgniteKernal kernal = ((IgniteKernal) ignite);

            GridCacheContext<Object, Object> cctx = kernal.internalCache(cacheName).context();

            assert cctx != null;

            GridKernalContext ctx = kernal.context();

            cctx.affinity().affinityReadyFuture(topVer).get();

            AffinityAssignment assign0 = cctx.affinity().assignment(topVer);

            GridAffinityAssignment assign = assign0 instanceof GridAffinityAssignment ?
                (GridAffinityAssignment)assign0 :
                new GridAffinityAssignment(topVer, assign0.assignment(), assign0.idealAssignment(), assign0.mvccCoordinator());

            return F.t(
                affinityMessage(ctx, cctx.config().getAffinity()),
                affinityMessage(ctx, cctx.config().getAffinityMapper()),
                assign);
        }

下面开始分析 类GridAffinityAssignment, 分析它为啥没有拿到节点数据 

实现接口 

org.apache.ignite.internal.processors.affinity.AffinityAssignment
/**
 * Cached affinity calculations.
 */
public interface AffinityAssignment {
    /**
     * @return Affinity assignment computed by affinity function.
     */
    public List<List<ClusterNode>> idealAssignment();

    /**
     * @return Affinity assignment.
     */
    public List<List<ClusterNode>> assignment();

    /**
     * @return Topology version.
     */
    public AffinityTopologyVersion topologyVersion();

    /**
     * Get affinity nodes for partition.
     *
     * @param part Partition.
     * @return Affinity nodes.
     */
    public List<ClusterNode> get(int part);

    /**
     * Get affinity node IDs for partition.
     *
     * @param part Partition.
     * @return Affinity nodes IDs.
     */
    public HashSet<UUID> getIds(int part);

    /**
     * @return Nodes having parimary and backup assignments.
     */
    public Set<ClusterNode> nodes();

    /**
     * @return Nodes having primary partitions assignments.
     */
    public Set<ClusterNode> primaryPartitionNodes();

    /**
     * Get primary partitions for specified node ID.
     *
     * @param nodeId Node ID to get primary partitions for.
     * @return Primary partitions for specified node ID.
     */
    public Set<Integer> primaryPartitions(UUID nodeId);

    /**
     * Get backup partitions for specified node ID.
     *
     * @param nodeId Node ID to get backup partitions for.
     * @return Backup partitions for specified node ID.
     */
    public Set<Integer> backupPartitions(UUID nodeId);

    /**
     * @return Mvcc coordinator.
     */
    public MvccCoordinator mvccCoordinator();
}

方法 assignment() 用于获取 节点列表, 还是看调用过程中, assignment这个字段,里面的值是ArrayList,数量是9,打开看到的只有一个有值

打开实现类  GridAffinityAssignment

 /** Collection of calculated affinity nodes. */
    private List<List<ClusterNode>> assignment;

assignment的数据结构是List里面套List

下面需要找到这个assignment是如何初始化的?

第一步 先把     AffinityJob 这个job弄回本地,调用看看日志

测试代码,获取的结果跟上面debug出来的结果一致,仍然是9个长度,但是内容只有一个

public class BasicTest extends AbstractPerformanceTest {

    public static void main(String[] args) {
        String confFile = "ignite-client2.xml";

        try (Ignite ignite = connectToServer(confFile)) {
            IgniteComputeImpl impl = (IgniteComputeImpl)ignite.compute();
            GridKernalContext ctx = (GridKernalContext)ToolKits.unsafeGet(impl,"ctx");
            AffinityTopologyVersion version = ctx.cache().context().exchange().readyAffinityVersion();
            Collection<GridAffinityAssignment> assignments = ignite.compute().broadcast(new MyAffinityJob(Tables.jk_gdxc_info,version));
            for(GridAffinityAssignment assignment:assignments)
            {
                System.out.println(assignment.assignment());
            }
        }
    }
}

MyAffinityJob 内容如下

public class MyAffinityJob implements
        IgniteCallable<GridAffinityAssignment>,
        Externalizable {
    /** */
    private static final long serialVersionUID = 0L;

    /** */
    @IgniteInstanceResource
    private Ignite ignite;

    /** */
    @LoggerResource
    private IgniteLogger log;

    /** */
    private String cacheName;

    /** */
    private AffinityTopologyVersion topVer;

    /**
     * @param cacheName Cache name.
     * @param topVer    Topology version.
     */
    public MyAffinityJob(@Nullable String cacheName, @NotNull AffinityTopologyVersion topVer) {
        this.cacheName = cacheName;
        this.topVer = topVer;
    }

    /**
     *
     */
    public MyAffinityJob() {
        // No-op.
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public GridAffinityAssignment call()
            throws Exception {
        assert ignite != null;
        assert log != null;

        IgniteKernal kernal = ((IgniteKernal) ignite);

        GridCacheContext<Object, Object> cctx = kernal.internalCache(cacheName).context();

        assert cctx != null;

        GridKernalContext ctx = kernal.context();

        cctx.affinity().affinityReadyFuture(topVer).get();

        AffinityAssignment assign0 = cctx.affinity().assignment(topVer);

        //开始查看数据
        GridCacheAffinityManager manager = cctx.affinity();
        GridAffinityAssignmentCache cache = cctx.group().affinity();
        //GridAffinityAssignmentCache 里面有个 head 变量,存储的就是 GridAffinityAssignment
        //affCache存储的是历史数据
        AtomicReference<GridAffinityAssignment> head = (AtomicReference<GridAffinityAssignment>)ToolKits.unsafeGet(cache,"head");
        log.info("head topologyVersion:"+head.get().topologyVersion());
        log.info("head assignment:"+head.get().assignment());
        ConcurrentNavigableMap<AffinityTopologyVersion, HistoryAffinityAssignment> affCache = (ConcurrentNavigableMap<AffinityTopologyVersion, HistoryAffinityAssignment>)ToolKits.unsafeGet(cache,"affCache");
        log.info("affCache:",affCache.toString());

        GridAffinityAssignment assign = assign0 instanceof GridAffinityAssignment ?
                (GridAffinityAssignment) assign0 : null;

        GridAffinityAssignment assignment = head.get();
        List<List<ClusterNode>> ll = assignment.assignment();
        log.info("分配数量:"+ll.size());
        for(List<ClusterNode> l:ll)
        {
            log.info("节点数量:"+l.size()+"===================================");
            log.info("节点内容:");
            for(ClusterNode clusterNode:l)
            {
                log.info("节点:"+clusterNode);
            }
        }

        return head.get();
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        U.writeString(out, cacheName);
        out.writeObject(topVer);
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
        cacheName = U.readString(in);
        topVer = (AffinityTopologyVersion) in.readObject();
    }
}

这里面遇到一个奇怪的现象, 如果是返回  GridAffinityAssignment 对象,到客户端后, 就是有问题的

如果直接返回  GridAffinityAssignment.assignment, 也就是节点列表 , 到客户端后,数据是正确的

也就是说服务端的数据是正常的,只是传到客户端发生了问题, 这里面需要搞清楚为啥会有问题?

通过这篇文章 https://blog.csdn.net/gs80140/article/details/89358677 分析了调用过程,找到了获取结果及反序列化的位置

通过2种方式调用,

第一种是直接返回assignment的数据

@Override
    public List<List<ClusterNode>> call()
            throws Exception {
        assert ignite != null;
        assert log != null;

        IgniteKernal kernal = ((IgniteKernal) ignite);

        GridCacheContext<Object, Object> cctx = kernal.internalCache(cacheName).context();

        assert cctx != null;

        GridKernalContext ctx = kernal.context();

        cctx.affinity().affinityReadyFuture(topVer).get();

        AffinityAssignment assign0 = cctx.affinity().assignment(topVer);

        //开始查看数据
        GridCacheAffinityManager manager = cctx.affinity();
        GridAffinityAssignmentCache cache = cctx.group().affinity();
        //GridAffinityAssignmentCache 里面有个 head 变量,存储的就是 GridAffinityAssignment
        //affCache存储的是历史数据
        AtomicReference<GridAffinityAssignment> head = (AtomicReference<GridAffinityAssignment>)ToolKits.unsafeGet(cache,"head");
        log.info("head topologyVersion:"+head.get().topologyVersion());
        log.info("head assignment:"+head.get().assignment());
        ConcurrentNavigableMap<AffinityTopologyVersion, HistoryAffinityAssignment> affCache = (ConcurrentNavigableMap<AffinityTopologyVersion, HistoryAffinityAssignment>)ToolKits.unsafeGet(cache,"affCache");
        log.info("affCache:",affCache.toString());

        GridAffinityAssignment assign = assign0 instanceof GridAffinityAssignment ?
                (GridAffinityAssignment) assign0 : null;

        GridAffinityAssignment assignment = head.get();
        List<List<ClusterNode>> ll = assignment.assignment();
        log.info("分配数量:"+ll.size());
        for(List<ClusterNode> l:ll)
        {
            log.info("节点数量:"+l.size()+"===================================");
            log.info("节点内容:");
            for(ClusterNode clusterNode:l)
            {
                log.info("节点:"+clusterNode);
            }
        }

        return ll;
    }

第二种是使用原有的调用方式, 返回的是 GridAffinityAssignment实例

@Override
    public GridAffinityAssignment call()
            throws Exception {
        assert ignite != null;
        assert log != null;

        IgniteKernal kernal = ((IgniteKernal) ignite);

        GridCacheContext<Object, Object> cctx = kernal.internalCache(cacheName).context();

        assert cctx != null;

        GridKernalContext ctx = kernal.context();

        cctx.affinity().affinityReadyFuture(topVer).get();

        AffinityAssignment assign0 = cctx.affinity().assignment(topVer);

        //开始查看数据
        GridCacheAffinityManager manager = cctx.affinity();
        GridAffinityAssignmentCache cache = cctx.group().affinity();
        //GridAffinityAssignmentCache 里面有个 head 变量,存储的就是 GridAffinityAssignment
        //affCache存储的是历史数据
        AtomicReference<GridAffinityAssignment> head = (AtomicReference<GridAffinityAssignment>)ToolKits.unsafeGet(cache,"head");
        log.info("head topologyVersion:"+head.get().topologyVersion());
        log.info("head assignment:"+head.get().assignment());
        ConcurrentNavigableMap<AffinityTopologyVersion, HistoryAffinityAssignment> affCache = (ConcurrentNavigableMap<AffinityTopologyVersion, HistoryAffinityAssignment>)ToolKits.unsafeGet(cache,"affCache");
        log.info("affCache:",affCache.toString());

        GridAffinityAssignment assign = assign0 instanceof GridAffinityAssignment ?
                (GridAffinityAssignment) assign0 : null;

        GridAffinityAssignment assignment = head.get();
        List<List<ClusterNode>> ll = assignment.assignment();
        log.info("分配数量:"+ll.size());
        for(List<ClusterNode> l:ll)
        {
            log.info("节点数量:"+l.size()+"===================================");
            log.info("节点内容:");
            for(ClusterNode clusterNode:l)
            {
                log.info("节点:"+clusterNode);
            }
        }

        return head.get();
    }

第一种方式调用,结果返回正常, 有9条数据,每条数据显示也正常

查看第一种调用方式的返回bytes, 是287325 , 数量不小

第二种方式调用,返回的结果不正常,有9条,但是显示的是空,只有一个有数据

而且返回的字节数据也不大, 64230 ,比第一种方式,字段码少了很多

客户端拿到的结果数据少, 这个要从两个方面分析, 第一,从服务器检查,到底返回了多少字节码, 第二, 从客户端接收的位置,检查到底接收了多少字节码

既然第一种方式,能把数据正确返回,那么说明服务器端数据是正确的, 那么问题就出现在传输的过程中, 首先确定代码是否有问题?

加入序列化程序,在服务端检查序列化后的长度,发现跟客户端接收到的长度一致

byte[] bytes = U.marshal(ctx.config().getMarshaller(),ll);

方式1序列化后的长度:287325

byte[] bytes = U.marshal(ctx.config().getMarshaller(),assign);

方式2序列化后的长度:64230

也就是说在服务端序列化assignment出了问题

这个难道是ignite出的问题? 准备单独对 GridAffinityAssignment 做序列化反序列化测试

在做单独测试前,先在本地测试一下刚才的程序

本地测试,服务端序列化后的长度也不太对,但是客户端拿到的数据是对的

方式1序列化后的长度:376098

方式2序列化后的长度:42135

经过比较, 从本地方式拿的数据没有Null Exception的异常,而从服务器拿的数据却有toString() 报空指针的异常

下面主要来分析 类 GridAffinityAssignment , 为啥会有空指针的异常

它有一个toString()的方法

 /** {@inheritDoc} */
    @Override public String toString() {
        return S.toString(GridAffinityAssignment.class, this, super.toString());
    }

首先在服务器端打印toString, 用try catch的方式,检查是哪个字段报错了

在服务器端不报错, 数据回到客户端之后报错, 因为里面的字段确实是没有反序列化成功,其它字段全部是空, 只有一个list还不完整

下面开始查找这个返回结果在服务器端是什么样的?返回到客户端是什么样子的, 怀疑是在传送的过程中发生了一些事情,导致部分值丢失.

通过客户端debug发现,回来的消息类型是  GridJobExecuteResponse

经过源码的分析以及debug测试,将ignite 版本从2.7降到 2.6后,此问题得到解决

简单的总结一下

主要原因是 调用  affinityJob 去拿  GridAffinityAssignment 的时候,后端返回的数据,反序列化后少数据, 而经过测试, 在服务端没返回的时候,反序列化也出问题, 经过debug及分析, 在序列化的时候, 处理handles出了问题,导至反序列化的时候,连一个ArrayList都没有反序列化成功, 具体是什么问题,还没有找到,

经过降低版本至2.6, 这个问题就没有发生, 所以最好的办法就是比较 2.6跟2.7在序列化及反序列化的过程中出了什么问题.

猜你喜欢

转载自blog.csdn.net/gs80140/article/details/89333644