quantitative analyst
round 1 - backgroud, probabilty
round 2 - r data manipulation, simulation probabilty game 7局4胜,
round 3 - business sense
round 4 - stat, 3 温度计如何variance最小,用weight做; geometric p(first head); p(first tail|first head)
data manipulation, r或者python都可以。面试官当场用邮件发给我一个csv file, 内容是不同国家几种商品不同vendor的价格, 有以下几列
country | type | vendor1 | vendor2 | vendor3 | vendor4
india | 1 | 13.5 | 14 | 15 | 14.5
1) 要求先算vendor1-- vendor4 的overall median.
import pandas as pd
import numpy as np
df = pd.read_csv('input.csv')
#method1
all_vendors1 = df['vendor1'].append(df['vendor2']).append(df['vendor3']).append(df['vendor4'])
median1 = np.median(all_vendors1 )
#medthod2
all_vendors2 = pd.Series(df[['vendor1','vendor2','vendor3','vendor4']].values.ravel())
median2 = np.median(all_vendors2)
2) group by 国家和type再算vendor1 -- vendor4的median.
median_vendor = df.groupby(['country','type']).median()
two-way ANOVA, 有两个factor:一个是vender,vender下有4个level;另外一个factor是product type,取决于country+type;
这部分我没得出什么具体结论,都没有run具体的analysis。当时就只说了说思路 可以做t test, ANOVA, linear regression。 interviewer也没有说对错,也没有说答案。。。
现在想想,好像应该先画图看一下, vendor之间correlation, boxplot/violin plot看下分布, 再breakdown by type, country看看, 有个idea data长什么样子。再做那些test