Is HDFS an append only file system? Then, how do people modify the files stored

HDFS is append only, yes. The short answer to your question is that, to modify any portion of a file that is already written, one must rewrite the entire file and replace the old file.

"Even for a single byte?" Yes, even for a single byte.
"Really?!" Yep, really. 

"Isn't that horribly inefficient?"
Yep, but it usually don't matter because large data processing applications are typically built around the idea that things don't change, piecemeal like this. Let's take a few examples.

Let's say you were to build a typical fact table in Apache Hive or Cloudera Impalato store transactional data from point of sale or an ecommerce website. Users can make purchases, in which case we simply add new records to the table. This works just fine in an append-only system. The question is, what happens if someone cancels an order or wants to update the quantity of an item purchased? You might be tempted to update the existing record. In fact, what you should probably do to preserve the series of actions is to append an adjustment or delta record that indicates a modification occurred to a previous transaction. To do this, we'd use a schema something like this (I'm going to ignore some details):

CREATE TABLE order_item_transactions (
  transaction_id bigint, // unique for each record
  order_id bigint, // non-unique
  version int,
  product_id bigint,
  quantity int
)

When we update an order item, we use a new transaction_id, but the same order_id. We bump the version (or use an epoch timestamp as the version) to indicate that the latter record takes precedence over the former. This is extremely common in data warehousing. We may also choose to build a derived table (effectively a materialized view) that is the latest version of all orders where order_id is unique. Something equivalent to:

CREATE TABLE latest_order_items AS SELECT order_id, first(version), first(transaction_id), ... FROM order_item_transactions GROUP BY order_id ORDER BY version DESC

HBase, which does need to modify records, uses this technique to modify and delete records. During a "compaction," it removes old versions of records.

A side benefit of this, and the reason why it's so popular in data warehousing, is because, even though you frequently need the most recent version of a record, you also want to see a full log of changes made for auditing purposes. Since Hadoop typically deals with batch data processing and long term storage applications, being append-only isn't as much as a limitation as you'd expect.

The MapR Distribution for Apache Hadoop supports random reads/writes, thus eliminating this problem.

Eric Sammer is correct in that many use cases involve adding data and not necessarily changing it, but there are many benefits to supporting that ability. 

First, it's a pre-requisite for supporting standard interfaces such as NFS. The NFS protocol doesn't have a concept of opening or closing a file. Therefore, the only way the NFS protocol can be supported is by having an underlying storage system that can support random writes. Furthermore, the vast majority of tools that exist today were not designed to work with an append-only system (the last such systems were CD-ROM and FTP, both 20 years old now), so they commonly write at random offsets (and even when they don't, the requests could get re-ordered on the host or on the network).

Second, having random write support enables innovation and capabilities that would otherwise not be possible. For example, MapR addressed the major HBase limitations (see MapR M7) by taking advantage of the underlying capabilities including random write support. Apache HBase was designed to work around the limitations of HDFS, and that comes at a high cost (eg, MapR M7 eliminates compactions which impact all stock HBase users).

References

http://www.quora.com/Is-HDFS-an-append-only-file-system-Then-how-do-people-modify-the-files-stored-on-HDFS

猜你喜欢

转载自ylzhj02.iteye.com/blog/2167134