0. Several commands of zipline about bundle
zipline bundles can get the data that they already have.
zipline bundles
zipline clean can clear old useless data, the following command can replace bundle and int:
zipline clean [-b bundle] –keep-last int
zipline ingest can fetch data from the specified location, replace yourkey and bundle with the following command:
QUANDL_API_KEY=yourkey zipline ingest [-b bundle]
1. How zipline uses this data
We _run
passed the parameter in the function bundle
, bundle_timestamp
and these two parameters can specify a data set. Inside _run
we see this code:
if bundle is not None:
bundle_data = bundles.load(
bundle,
environ,
bundle_timestamp,
)
prefix, connstr = re.split(
r'sqlite:///',
str(bundle_data.asset_finder.engine.url),
maxsplit=1,
)
if prefix:
raise ValueError(
"invalid url %r, must begin with 'sqlite:///'" %
str(bundle_data.asset_finder.engine.url),
)
env = TradingEnvironment(asset_db_path=connstr, environ=environ)
first_trading_day =\
bundle_data.equity_minute_bar_reader.first_trading_day
data = DataPortal(
env.asset_finder,
trading_calendar=trading_calendar,
first_trading_day=first_trading_day,
equity_minute_reader=bundle_data.equity_minute_bar_reader,
equity_daily_reader=bundle_data.equity_daily_bar_reader,
adjustment_reader=bundle_data.adjustment_reader,
)
First load the data, then build TradingEnvironment
, then build DataPortal
. Let's first look at the load
function:
def load(name, environ=os.environ, timestamp=None):
"""Loads a previously ingested bundle.
Parameters
----------
name : str
The name of the bundle.
environ : mapping, optional
The environment variables. Defaults of os.environ.
timestamp : datetime, optional
The timestamp of the data to lookup.
Defaults to the current time.
Returns
-------
bundle_data : BundleData
The raw data readers for this bundle.
"""
if timestamp is None:
timestamp = pd.Timestamp.utcnow()
timestr = most_recent_data(name, timestamp, environ=environ)
return BundleData(
asset_finder=AssetFinder(
asset_db_path(name, timestr, environ=environ),
),
equity_minute_bar_reader=BcolzMinuteBarReader(
minute_equity_path(name, timestr, environ=environ),
),
equity_daily_bar_reader=BcolzDailyBarReader(
daily_equity_path(name, timestr, environ=environ),
),
adjustment_reader=SQLiteAdjustmentReader(
adjustment_db_path(name, timestr, environ=environ),
),
)
4 datasets are loaded: 2 are stored with sqlite and the other 2 are stored with Bcolz. As for why it exists, we will discuss it later. We just need to know that asset data, adjustment data, equity_minute_bar data and equity_daily_bar data are loaded here. After loading, the construction starts TradingEnvironment
, and the asset data related information is also saved in this. code show as below:
class TradingEnvironment(object):
...
def __init__(
self,
load=None,
bm_symbol='SPY',
exchange_tz="US/Eastern",
trading_calendar=None,
asset_db_path=':memory:',
future_chain_predicates=CHAIN_PREDICATES,
environ=None,
):
...
if isinstance(asset_db_path, string_types):
asset_db_path = 'sqlite:///' + asset_db_path
self.engine = engine = create_engine(asset_db_path)
else:
self.engine = engine = asset_db_path
if engine is not None:
AssetDBWriter(engine).init_db()
self.asset_finder = AssetFinder(
engine,
future_chain_predicates=future_chain_predicates)
else:
self.asset_finder = None
Finally, let's focus on DataPortal
the data that zipline uses is related to it. We will introduce it in the next section.