写之前先吐槽一下自己的sb公司环境,电脑上不了网,优盘又不能插。所以做点笔记基本上都是晚上回家再写一遍。哎,废话不说了
先贴个hbase在构造函数中起来的RPC服务的UML图 :http://blackproof.iteye.com/blog/2029170
HMaster启动会调用Run方法,概述为
HBase启动
hmaster处理备份master 进入HMaster的启动过程:方法: 1.Zk监控类 2.创建线程池,并启动60010jetty服务 3.Regionserver启动 4.HLog处理流程 5.HMaster注册root和meta表 6.准备步骤,收集offline server including region和zk assigned region 7.分配region 8.检查子region 9.启动balance线程
详细分析:
调用方法becomeActiveMaster,先检查配置项"hbase.master.backup",自己是否backup机器,如果是则直接block直至检查到系统中的active master挂掉(默认每3分钟检查一次)
private boolean becomeActiveMaster(MonitoredTask startupStatus) throws InterruptedException { // TODO: This is wrong!!!! Should have new servername if we restart ourselves, // if we come back to life. this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName, this); this.zooKeeper.registerListener(activeMasterManager); stallIfBackupMaster(this.conf, this.activeMasterManager); // The ClusterStatusTracker is setup before the other // ZKBasedSystemTrackers because it's needed by the activeMasterManager // to check if the cluster should be shutdown. this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this); this.clusterStatusTracker.start(); return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus); }
1) Zk监控类
首先初始化所有zk监控类,调用方法initializeZKBasedSystemTrackers
初始化zk listener
1.createCatalogTracker(rootRegionTracker,metaRegionTracker)是root和meta的监控类
2. AssignmentManager 管理分配region
3. RegionServerManager 监控regionserver,服务已死的server
4.DrainingServerTracker 服务维护draining RS list
void initializeZKBasedSystemTrackers() throws IOException, InterruptedException, KeeperException { this.catalogTracker = createCatalogTracker(this.zooKeeper, this.conf, this); this.catalogTracker.start(); this.balancer = LoadBalancerFactory.getLoadBalancer(conf); this.loadBalancerTracker = new LoadBalancerTracker(zooKeeper, this); this.loadBalancerTracker.start(); this.assignmentManager = new AssignmentManager(this, serverManager, this.catalogTracker, this.balancer, this.executorService, this.metricsMaster, this.tableLockManager); zooKeeper.registerListenerFirst(assignmentManager); this.regionServerTracker = new RegionServerTracker(zooKeeper, this, this.serverManager); this.regionServerTracker.start(); this.drainingServerTracker = new DrainingServerTracker(zooKeeper, this, this.serverManager); this.drainingServerTracker.start(); // Set the cluster as up. If new RSs, they'll be waiting on this before // going ahead with their startup. boolean wasUp = this.clusterStatusTracker.isClusterUp(); if (!wasUp) this.clusterStatusTracker.setClusterUp(); LOG.info("Server active/primary master=" + this.serverName + ", sessionid=0x" + Long.toHexString(this.zooKeeper.getRecoverableZooKeeper().getSessionId()) + ", setting cluster-up flag (Was=" + wasUp + ")"); // create the snapshot manager this.snapshotManager = new SnapshotManager(this, this.metricsMaster); }
2)创建线程池,并启动60010jetty服务
1.启动线程池,和以下线程
MASTER_META_SERVER_OPERATIONS
MASTER_SERVER_OPERATIONS
MASTER_CLOSE_REGION
MASTER_OPEN_REGION
MASTER_TABLE_OPERATIONS
2.LogCleaner线程 清除.oldlog目录
3.60010 jetty 服务
void startServiceThreads() throws IOException{ // Start the executor service pools this.executorService.startExecutorService(ExecutorType.MASTER_OPEN_REGION, conf.getInt("hbase.master.executor.openregion.threads", 5)); this.executorService.startExecutorService(ExecutorType.MASTER_CLOSE_REGION, conf.getInt("hbase.master.executor.closeregion.threads", 5)); this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS, conf.getInt("hbase.master.executor.serverops.threads", 5)); this.executorService.startExecutorService(ExecutorType.MASTER_META_SERVER_OPERATIONS, conf.getInt("hbase.master.executor.serverops.threads", 5)); this.executorService.startExecutorService(ExecutorType.M_LOG_REPLAY_OPS, conf.getInt("hbase.master.executor.logreplayops.threads", 10)); // We depend on there being only one instance of this executor running // at a time. To do concurrency, would need fencing of enable/disable of // tables. this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS, 1); // Start log cleaner thread String n = Thread.currentThread().getName(); int cleanerInterval = conf.getInt("hbase.master.cleaner.interval", 60 * 1000); this.logCleaner = new LogCleaner(cleanerInterval, this, conf, getMasterFileSystem().getFileSystem(), getMasterFileSystem().getOldLogDir()); Threads.setDaemonThreadRunning(logCleaner.getThread(), n + ".oldLogCleaner"); //start the hfile archive cleaner thread Path archiveDir = HFileArchiveUtil.getArchivePath(conf); this.hfileCleaner = new HFileCleaner(cleanerInterval, this, conf, getMasterFileSystem() .getFileSystem(), archiveDir); Threads.setDaemonThreadRunning(hfileCleaner.getThread(), n + ".archivedHFileCleaner"); // Start the health checker if (this.healthCheckChore != null) { Threads.setDaemonThreadRunning(this.healthCheckChore.getThread(), n + ".healthChecker"); } // Start allowing requests to happen. this.rpcServer.openServer(); this.rpcServerOpen = true; if (LOG.isTraceEnabled()) { LOG.trace("Started service threads"); } }
3)Regionserver启动
serverManager调用方法:waitForRegionServers
等待regionserver启动,在满足一下条件后继续启动hmaster:
a 至少等待4.5s("hbase.master.wait.on.regionservers.timeout")
b 成功启动regionserver节点数>=1("hbase.master.wait.on.regionservers.mintostart")
c 1.5s内没有regionsever死掉或新启动("hbase.master.wait.on.regionservers.interval")
public void waitForRegionServers(MonitoredTask status) throws InterruptedException { final long interval = this.master.getConfiguration(). getLong(WAIT_ON_REGIONSERVERS_INTERVAL, 1500); final long timeout = this.master.getConfiguration(). getLong(WAIT_ON_REGIONSERVERS_TIMEOUT, 4500); int minToStart = this.master.getConfiguration(). getInt(WAIT_ON_REGIONSERVERS_MINTOSTART, 1); if (minToStart < 1) { LOG.warn(String.format( "The value of '%s' (%d) can not be less than 1, ignoring.", WAIT_ON_REGIONSERVERS_MINTOSTART, minToStart)); minToStart = 1; } int maxToStart = this.master.getConfiguration(). getInt(WAIT_ON_REGIONSERVERS_MAXTOSTART, Integer.MAX_VALUE); if (maxToStart < minToStart) { LOG.warn(String.format( "The value of '%s' (%d) is set less than '%s' (%d), ignoring.", WAIT_ON_REGIONSERVERS_MAXTOSTART, maxToStart, WAIT_ON_REGIONSERVERS_MINTOSTART, minToStart)); maxToStart = Integer.MAX_VALUE; } long now = System.currentTimeMillis(); final long startTime = now; long slept = 0; long lastLogTime = 0; long lastCountChange = startTime; int count = countOfRegionServers(); int oldCount = 0; while ( !this.master.isStopped() && count < maxToStart && (lastCountChange+interval > now || timeout > slept || count < minToStart) ){ // Log some info at every interval time or if there is a change if (oldCount != count || lastLogTime+interval < now){ lastLogTime = now; String msg = "Waiting for region servers count to settle; currently"+ " checked in " + count + ", slept for " + slept + " ms," + " expecting minimum of " + minToStart + ", maximum of "+ maxToStart+ ", timeout of "+timeout+" ms, interval of "+interval+" ms."; LOG.info(msg); status.setStatus(msg); } // We sleep for some time final long sleepTime = 50; Thread.sleep(sleepTime); now = System.currentTimeMillis(); slept = now - startTime; oldCount = count; count = countOfRegionServers(); if (count != oldCount) { lastCountChange = now; } } LOG.info("Finished waiting for region servers count to settle;" + " checked in " + count + ", slept for " + slept + " ms," + " expecting minimum of " + minToStart + ", maximum of "+ maxToStart+","+ " master is "+ (this.master.isStopped() ? "stopped.": "running.") ); }4) HLog处理流程(贴出代码是处理流程部分代码)
处理目的是为了找到没有server归属的HLog,并将其付给其他的region server
masterfilesystem 调用 splitLogAfterStartup方法
1.split加锁
2.获得所有HLog与online region server做碰撞,没有碰上的Hlog进入splitLog方法
3.创建HLogSplitter,等待saftmode,调用HlogSplitter.splitLog,
4.HLogSplitter的reader读取hlog到内存中EntryBuffers,读取完毕移动log,有问题的到.corrupt目录中,处理完的放在.oldlogs中
4.HLogSplitter的 OutputSink 创建多个writer线程,读取entryBuffers写到region下的recovered.edits下的文件夹
5.解锁
this.splitLogLock.lock(); try { HLogSplitter splitter = HLogSplitter.createLogSplitter( conf, rootdir, logDir, oldLogDir, this.fs); try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { LOG.warn("Retrying splitting because of:", e); //An HLogSplitter instance can only be used once. Get new instance. splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir, oldLogDir, this.fs); splitter.splitLog(); } splitTime = splitter.getTime(); splitLogSize = splitter.getSize(); } finally { this.splitLogLock.unlock(); }
private List<Path> splitLog(final FileStatus[] logfiles) throws IOException { List<Path> processedLogs = new ArrayList<Path>(); List<Path> corruptedLogs = new ArrayList<Path>(); List<Path> splits = null; boolean skipErrors = conf.getBoolean("hbase.hlog.split.skip.errors", true); countTotalBytes(logfiles); splitSize = 0; outputSink.startWriterThreads(entryBuffers); try { int i = 0; for (FileStatus log : logfiles) { Path logPath = log.getPath(); long logLength = log.getLen(); splitSize += logLength; logAndReport("Splitting hlog " + (i++ + 1) + " of " + logfiles.length + ": " + logPath + ", length=" + logLength); Reader in; try { in = getReader(fs, log, conf, skipErrors); if (in != null) { parseHLog(in, logPath, entryBuffers, fs, conf, skipErrors); try { in.close(); } catch (IOException e) { LOG.warn("Close log reader threw exception -- continuing", e); } } processedLogs.add(logPath); } catch (CorruptedLogFileException e) { LOG.info("Got while parsing hlog " + logPath + ". Marking as corrupted", e); corruptedLogs.add(logPath); continue; } } status.setStatus("Log splits complete. Checking for orphaned logs."); if (fs.listStatus(srcDir).length > processedLogs.size() + corruptedLogs.size()) { throw new OrphanHLogAfterSplitException( "Discovered orphan hlog after split. Maybe the " + "HRegionServer was not dead when we started"); } } finally { status.setStatus("Finishing writing output logs and closing down."); splits = outputSink.finishWritingAndClose(); } status.setStatus("Archiving logs after completed split"); archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); return splits; }
5)HMaster注册root和meta表
int assignRootAndMeta(MonitoredTask status) throws InterruptedException, IOException, KeeperException { int assigned = 0; long timeout = this.conf.getLong("hbase.catalog.verification.timeout", 1000); // Work on ROOT region. Is it in zk in transition? status.setStatus("Assigning ROOT region"); boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); ServerName currentRootServer = null; boolean rootRegionLocation = catalogTracker.verifyRootRegionLocation(timeout); if (!rit && !rootRegionLocation) { currentRootServer = this.catalogTracker.getRootLocation(); splitLogAndExpireIfOnline(currentRootServer); this.assignmentManager.assignRoot(); waitForRootAssignment(); assigned++; } else if (rit && !rootRegionLocation) { waitForRootAssignment(); assigned++; } else { // Region already assigned. We didn't assign it. Add to in-memory state. this.assignmentManager.regionOnline(HRegionInfo.ROOT_REGIONINFO, this.catalogTracker.getRootLocation()); } // Enable the ROOT table if on process fail over the RS containing ROOT // was active. enableCatalogTables(Bytes.toString(HConstants.ROOT_TABLE_NAME)); LOG.info("-ROOT- assigned=" + assigned + ", rit=" + rit + ", location=" + catalogTracker.getRootLocation()); // Work on meta region status.setStatus("Assigning META region"); rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.FIRST_META_REGIONINFO); boolean metaRegionLocation = this.catalogTracker.verifyMetaRegionLocation(timeout); if (!rit && !metaRegionLocation) { ServerName currentMetaServer = this.catalogTracker.getMetaLocationOrReadLocationFromRoot(); if (currentMetaServer != null && !currentMetaServer.equals(currentRootServer)) { splitLogAndExpireIfOnline(currentMetaServer); } assignmentManager.assignMeta(); enableSSHandWaitForMeta(); assigned++; } else if (rit && !metaRegionLocation) { enableSSHandWaitForMeta(); assigned++; } else { // Region already assigned. We didnt' assign it. Add to in-memory state. this.assignmentManager.regionOnline(HRegionInfo.FIRST_META_REGIONINFO, this.catalogTracker.getMetaLocation()); } enableCatalogTables(Bytes.toString(HConstants.META_TABLE_NAME)); LOG.info(".META. assigned=" + assigned + ", rit=" + rit + ", location=" + catalogTracker.getMetaLocation()); status.setStatus("META and ROOT assigned."); return assigned; }
6)准备步骤,收集offline server including region和zk assigned region
hmaster调用assignManager的joinCluster方法 - rebuildUserRegions方法
获取不在线的region server包括他的region,以及zk上已经注册的region(hmaster宕掉的情况)
Map<ServerName, List<Pair<HRegionInfo, Result>>> rebuildUserRegions() throws IOException, KeeperException { // Region assignment from META List<Result> results = MetaReader.fullScan(this.catalogTracker); // Get any new but slow to checkin region server that joined the cluster Set<ServerName> onlineServers = serverManager.getOnlineServers().keySet(); // Map of offline servers and their regions to be returned Map<ServerName, List<Pair<HRegionInfo,Result>>> offlineServers = new TreeMap<ServerName, List<Pair<HRegionInfo, Result>>>(); // Iterate regions in META for (Result result : results) { boolean disabled = false; boolean disablingOrEnabling = false; Pair<HRegionInfo, ServerName> region = MetaReader.parseCatalogResult(result); if (region == null) continue; HRegionInfo regionInfo = region.getFirst(); ServerName regionLocation = region.getSecond(); if (regionInfo == null) continue; String tableName = regionInfo.getTableNameAsString(); if (regionLocation == null) { // regionLocation could be null if createTable didn't finish properly. // When createTable is in progress, HMaster restarts. // Some regions have been added to .META., but have not been assigned. // When this happens, the region's table must be in ENABLING state. // It can't be in ENABLED state as that is set when all regions are // assigned. // It can't be in DISABLING state, because DISABLING state transitions // from ENABLED state when application calls disableTable. // It can't be in DISABLED state, because DISABLED states transitions // from DISABLING state. if (false == checkIfRegionsBelongsToEnabling(regionInfo)) { LOG.warn("Region " + regionInfo.getEncodedName() + " has null regionLocation." + " But its table " + tableName + " isn't in ENABLING state."); } addTheTablesInPartialState(this.disablingTables, this.enablingTables, regionInfo, tableName); } else if (!onlineServers.contains(regionLocation)) { // Region is located on a server that isn't online List<Pair<HRegionInfo, Result>> offlineRegions = offlineServers.get(regionLocation); if (offlineRegions == null) { offlineRegions = new ArrayList<Pair<HRegionInfo,Result>>(1); offlineServers.put(regionLocation, offlineRegions); } offlineRegions.add(new Pair<HRegionInfo,Result>(regionInfo, result)); disabled = checkIfRegionBelongsToDisabled(regionInfo); disablingOrEnabling = addTheTablesInPartialState(this.disablingTables, this.enablingTables, regionInfo, tableName); // need to enable the table if not disabled or disabling or enabling // this will be used in rolling restarts enableTableIfNotDisabledOrDisablingOrEnabling(disabled, disablingOrEnabling, tableName); } else { // If region is in offline and split state check the ZKNode if (regionInfo.isOffline() && regionInfo.isSplit()) { String node = ZKAssign.getNodeName(this.watcher, regionInfo .getEncodedName()); Stat stat = new Stat(); byte[] data = ZKUtil.getDataNoWatch(this.watcher, node, stat); // If znode does not exist dont consider this region if (data == null) { LOG.debug("Region "+ regionInfo.getRegionNameAsString() + " split is completed. " + "Hence need not add to regions list"); continue; } } // Region is being served and on an active server // add only if region not in disabled and enabling table if (false == checkIfRegionBelongsToDisabled(regionInfo) && false == checkIfRegionsBelongsToEnabling(regionInfo)) { synchronized (this.regions) { regions.put(regionInfo, regionLocation); addToServers(regionLocation, regionInfo); } } disablingOrEnabling = addTheTablesInPartialState(this.disablingTables, this.enablingTables, regionInfo, tableName); disabled = checkIfRegionBelongsToDisabled(regionInfo); // need to enable the table if not disabled or disabling or enabling // this will be used in rolling restarts enableTableIfNotDisabledOrDisablingOrEnabling(disabled, disablingOrEnabling, tableName); } } return offlineServers; }
7)分配region
assignmanager调用processDeadServersAndRegionsInTransition方法
分配region分为两个情况,一种是只需要分配的region不为0,则为mater宕机情况A,反之为正常情况B
A:processDeadServersAndRecoverLostRegions();
B:cleanoutUnassigned();
assignAllUserRegions();
分支A:方法processDeadServersAndRecoverLostRegions
遍历所有所有不在线上的region server,以及他的region
1.1判断region是否在zk上,若不在则表示region已经被一起在线的region server所处理
1.2若存在,则继续判断当前region是否需要分配
1.2.1region是disabled table的,则不需要处理
1.2.2region是splited region,则须要处理子region,若他的两个子region丢失,则在meta表上注册子region
1.3若需要分配region,调用方法createOrForceNodeOffline,给region设置watch
private void processDeadServersAndRecoverLostRegions( Map<ServerName, List<Pair<HRegionInfo, Result>>> deadServers, List<String> nodes) throws IOException, KeeperException { if (null != deadServers) { Set<ServerName> actualDeadServers = this.serverManager.getDeadServers(); for (Map.Entry<ServerName, List<Pair<HRegionInfo, Result>>> deadServer : deadServers.entrySet()) { // skip regions of dead servers because SSH will process regions during rs expiration. // see HBASE-5916 if (actualDeadServers.contains(deadServer.getKey())) { for (Pair<HRegionInfo, Result> deadRegion : deadServer.getValue()) { nodes.remove(deadRegion.getFirst().getEncodedName()); } continue; } List<Pair<HRegionInfo, Result>> regions = deadServer.getValue(); for (Pair<HRegionInfo, Result> region : regions) { HRegionInfo regionInfo = region.getFirst(); Result result = region.getSecond(); // If region was in transition (was in zk) force it offline for // reassign try { RegionTransitionData data = ZKAssign.getData(watcher, regionInfo.getEncodedName()); // If zk node of this region has been updated by a live server, // we consider that this region is being handled. // So we should skip it and process it in // processRegionsInTransition. if (data != null && data.getOrigin() != null && serverManager.isServerOnline(data.getOrigin())) { LOG.info("The region " + regionInfo.getEncodedName() + "is being handled on " + data.getOrigin()); continue; } // Process with existing RS shutdown code boolean assign = ServerShutdownHandler.processDeadRegion( regionInfo, result, this, this.catalogTracker); if (assign) { ZKAssign.createOrForceNodeOffline(watcher, regionInfo, master.getServerName()); if (!nodes.contains(regionInfo.getEncodedName())) { nodes.add(regionInfo.getEncodedName()); } } } catch (KeeperException.NoNodeException nne) { // This is fine } } } }
之后对需要分配的region,进入RIT工作流,调用方法processRegionsInTransition
RIT流程,贴一个RIT流程分析的帖子http://blog.csdn.net/shenxiaoming77/article/details/18360199
void processRegionsInTransition(final RegionTransitionData data, final HRegionInfo regionInfo, final Map<ServerName, List<Pair<HRegionInfo, Result>>> deadServers, int expectedVersion) throws KeeperException { String encodedRegionName = regionInfo.getEncodedName(); LOG.info("Processing region " + regionInfo.getRegionNameAsString() + " in state " + data.getEventType()); synchronized (regionsInTransition) { RegionState regionState = regionsInTransition.get(encodedRegionName); if (regionState != null || failoverProcessedRegions.containsKey(encodedRegionName)) { // Just return return; } switch (data.getEventType()) { case M_ZK_REGION_CLOSING: // If zk node of the region was updated by a live server skip this // region and just add it into RIT. if (isOnDeadServer(regionInfo, deadServers) && (data.getOrigin() == null || !serverManager.isServerOnline(data.getOrigin()))) { // If was on dead server, its closed now. Force to OFFLINE and this // will get it reassigned if appropriate forceOffline(regionInfo, data); } else { // Just insert region into RIT. // If this never updates the timeout will trigger new assignment regionsInTransition.put(encodedRegionName, new RegionState( regionInfo, RegionState.State.CLOSING, data.getStamp(), data.getOrigin())); } failoverProcessedRegions.put(encodedRegionName, regionInfo); break; case RS_ZK_REGION_CLOSED: case RS_ZK_REGION_FAILED_OPEN: // Region is closed, insert into RIT and handle it addToRITandCallClose(regionInfo, RegionState.State.CLOSED, data); failoverProcessedRegions.put(encodedRegionName, regionInfo); break; case M_ZK_REGION_OFFLINE: // If zk node of the region was updated by a live server skip this // region and just add it into RIT. if (isOnDeadServer(regionInfo, deadServers) && (data.getOrigin() == null || !serverManager.isServerOnline(data.getOrigin()))) { // Region is offline, insert into RIT and handle it like a closed addToRITandCallClose(regionInfo, RegionState.State.OFFLINE, data); } else if (data.getOrigin() != null && !serverManager.isServerOnline(data.getOrigin())) { // to handle cases where offline node is created but sendRegionOpen // RPC is not yet sent addToRITandCallClose(regionInfo, RegionState.State.OFFLINE, data); } else { regionsInTransition.put(encodedRegionName, new RegionState( regionInfo, RegionState.State.PENDING_OPEN, data.getStamp(), data .getOrigin())); } failoverProcessedRegions.put(encodedRegionName, regionInfo); break; case RS_ZK_REGION_OPENING: // TODO: Could check if it was on deadServers. If it was, then we could // do what happens in TimeoutMonitor when it sees this condition. // Just insert region into RIT // If this never updates the timeout will trigger new assignment if (regionInfo.isMetaTable()) { regionsInTransition.put(encodedRegionName, new RegionState( regionInfo, RegionState.State.OPENING, data.getStamp(), data .getOrigin())); // If ROOT or .META. table is waiting for timeout monitor to assign // it may take lot of time when the assignment.timeout.period is // the default value which may be very long. We will not be able // to serve any request during this time. // So we will assign the ROOT and .META. region immediately. processOpeningState(regionInfo); break; } regionsInTransition.put(encodedRegionName, new RegionState(regionInfo, RegionState.State.OPENING, data.getStamp(), data.getOrigin())); failoverProcessedRegions.put(encodedRegionName, regionInfo); break; case RS_ZK_REGION_OPENED: // Region is opened, insert into RIT and handle it regionsInTransition.put(encodedRegionName, new RegionState( regionInfo, RegionState.State.OPEN, data.getStamp(), data.getOrigin())); ServerName sn = data.getOrigin() == null? null: data.getOrigin(); // sn could be null if this server is no longer online. If // that is the case, just let this RIT timeout; it'll be assigned // to new server then. if (sn == null) { LOG.warn("Region in transition " + regionInfo.getEncodedName() + " references a null server; letting RIT timeout so will be " + "assigned elsewhere"); } else if (!serverManager.isServerOnline(sn) && (isOnDeadServer(regionInfo, deadServers) || regionInfo.isMetaRegion() || regionInfo.isRootRegion())) { forceOffline(regionInfo, data); } else { new OpenedRegionHandler(master, this, regionInfo, sn, expectedVersion) .process(); } failoverProcessedRegions.put(encodedRegionName, regionInfo); break; } } }
B分支:1.调用方法cleanoutUnassigned,清除zk所有node,重新watch
2.调用assignAllUserRegions
1.获取所有region
2.分配region,当hbase.master.startup.retainassign为true,按照meta表信息分配region,反之 则online region server中随机选择
8)检查子region
void fixupDaughters(final MonitoredTask status) throws IOException { final Map<HRegionInfo, Result> offlineSplitParents = new HashMap<HRegionInfo, Result>(); // This visitor collects offline split parents in the .META. table MetaReader.Visitor visitor = new MetaReader.Visitor() { @Override public boolean visit(Result r) throws IOException { if (r == null || r.isEmpty()) return true; HRegionInfo info = MetaReader.parseHRegionInfoFromCatalogResult( r, HConstants.REGIONINFO_QUALIFIER); if (info == null) return true; // Keep scanning if (info.isOffline() && info.isSplit()) { offlineSplitParents.put(info, r); } // Returning true means "keep scanning" return true; } }; // Run full scan of .META. catalog table passing in our custom visitor MetaReader.fullScan(this.catalogTracker, visitor); // Now work on our list of found parents. See if any we can clean up. int fixups = 0; for (Map.Entry<HRegionInfo, Result> e : offlineSplitParents.entrySet()) { fixups += ServerShutdownHandler.fixupDaughters( e.getValue(), assignmentManager, catalogTracker); } if (fixups != 0) { LOG.info("Scanned the catalog and fixed up " + fixups + " missing daughter region(s)"); } }
9)启动balance线程
HMaster balance策略为region server拥有的最小region为 regions/servers,最大为regions/servers+1
先找到超过max的region server,获得需要分配的region;以及获得小于min的region server
再将需要分配的region分给小于min的region server直到max或分配结束
@Override public boolean balance() { // if master not initialized, don't run balancer. if (!this.initialized) { LOG.debug("Master has not been initialized, don't run balancer."); return false; } // If balance not true, don't run balancer. if (!this.balanceSwitch) return false; // Do this call outside of synchronized block. int maximumBalanceTime = getBalancerCutoffTime(); long cutoffTime = System.currentTimeMillis() + maximumBalanceTime; boolean balancerRan; synchronized (this.balancer) { // Only allow one balance run at at time. if (this.assignmentManager.isRegionsInTransition()) { LOG.debug("Not running balancer because " + this.assignmentManager.getRegionsInTransition().size() + " region(s) in transition: " + org.apache.commons.lang.StringUtils. abbreviate(this.assignmentManager.getRegionsInTransition().toString(), 256)); return false; } if (this.serverManager.areDeadServersInProgress()) { LOG.debug("Not running balancer because processing dead regionserver(s): " + this.serverManager.getDeadServers()); return false; } if (this.cpHost != null) { try { if (this.cpHost.preBalance()) { LOG.debug("Coprocessor bypassing balancer request"); return false; } } catch (IOException ioe) { LOG.error("Error invoking master coprocessor preBalance()", ioe); return false; } } Map<String, Map<ServerName, List<HRegionInfo>>> assignmentsByTable = this.assignmentManager.getAssignmentsByTable(); List<RegionPlan> plans = new ArrayList<RegionPlan>(); for (Map<ServerName, List<HRegionInfo>> assignments : assignmentsByTable.values()) { List<RegionPlan> partialPlans = this.balancer.balanceCluster(assignments); if (partialPlans != null) plans.addAll(partialPlans); } int rpCount = 0; // number of RegionPlans balanced so far long totalRegPlanExecTime = 0; balancerRan = plans != null; if (plans != null && !plans.isEmpty()) { for (RegionPlan plan: plans) { LOG.info("balance " + plan); long balStartTime = System.currentTimeMillis(); this.assignmentManager.balance(plan); totalRegPlanExecTime += System.currentTimeMillis()-balStartTime; rpCount++; if (rpCount < plans.size() && // if performing next balance exceeds cutoff time, exit the loop (System.currentTimeMillis() + (totalRegPlanExecTime / rpCount)) > cutoffTime) { LOG.debug("No more balancing till next balance run; maximumBalanceTime=" + maximumBalanceTime); break; } } } if (this.cpHost != null) { try { this.cpHost.postBalance(); } catch (IOException ioe) { // balancing already succeeded so don't change the result LOG.error("Error invoking master coprocessor postBalance()", ioe); } } } return balancerRan; }
终于写完了,也感谢能看到这里的朋友,因为是回家重写的,有点糙,对不起了