mathematica数据拟合教程

Assembling data

LinearModelFit operates on lists of data. In the easiest case, it uses a list of the form {{Subscript[x, 1],Subscript[y, 1]}, {Subscript[x, 2],Subscript[y, 2]}, ...}. This usually is not the way the data is on your computer though, so let's first construct that list.


temp2 at stupidname dot org
A common thing to do with data is fitting it to a function provided by theory. However, this isn't done by fancy mouse clicking, but the usual Mathematica way – i.e. you have to a) enter everything yourself and b) you have to be aware of the features available in advance since nobody placed a blinking button for "Hey let's paint confidence bands, shall we" somewhere.
This notebook is basically divided into two parts, linear and nonlinear fitting. Although the topic is fitting, I'll sometimes have to rely on other Mathematica features you're not familiar with, for example to generate datasets I can use for the actual subject. On this, I have two recommendations: a) It's sufficient to read the explanation of the code above/below it. "Generates erf()-distributed data of some length" is enough to work with the data, don't be scared if you don't understand the code that is actually executed, because you don't need to. b) If you don't understand something in Mathematica, it's usually a good idea to read about it in the excellent help function. Not because you need it for this document, but for others you are going to write yourself.
How to evaluate this notebook. Many variables are reused later on, so it's a bad idea to evaluate everything and then start changing things in the middle. If you do so and run into unexpected results, re-evaluate the topmost chapter and it should work fine again. Oh, and if you've been asked to run the initialization cells and clicked on no, it's a good idea to scroll to the very bottom and evaluate the very last section, "Initialization", because some things will result in errors otherwise. ☺
Linear regression
The basics
Assembling data
LinearModelFit operates on lists of data. In the easiest case, it uses a list of the form {{Subscript[x, 1],Subscript[y, 1]}, {Subscript[x, 2],Subscript[y, 2]}, ...}. This usually is not the way the data is on your computer though, so let's first construct that list.
It is a good idea to organize data that belongs to the same measured variable in one separate list each. Suppose you've measured current and voltage and wrote the the values in your lab report. This is how I suggest you enter your data into the Mathematica notebook:
dataCurrent={0.0000,0.0012,0.0019,0.0034,0.0036,0.0050,0.0062,0.0072,0.0079,0.0092,0.0097}
dataVoltage={0.0,6.3,16.8,25.1,43.6,53.2,57.4,71.8,81.5,94.1,97.4}
{0.,0.0012,0.0019,0.0034,0.0036,0.005,0.0062,0.0072,0.0079,0.0092,0.0097}
{0.,6.3,16.8,25.1,43.6,53.2,57.4,71.8,81.5,94.1,97.4}
You could of course have entered the data in the format presented above in the first place, however the version presented above is way more convenient for later data manipulation and reaggregation.
On a side note, it's also a good idea to convert all the data to flat SI units without prefixes, i.e. don't use milli-anything. If you've measured mA, enter the table like this:
(10^-3) {1.2,2.1,3.3,3.9} (* Factor 10^-3 to convert the values to A *)
{0.0012,0.0021,0.0033,0.0039}
Always doing this will save your a lot of trouble. Believe me, I've been there. (Example: Enter data in mA, but error bars in A. Not desirable, especially when your dataset has many parameters, dimensions and what not.)
Now  we'll have to construct our list of current/voltage pairs. First of all, we obviously need to combine them into a single list. This is done pretty straight-forward:
dataCombined={dataCurrent,dataVoltage} (* {x values, y values} *)
{{0.,0.0012,0.0019,0.0034,0.0036,0.005,0.0062,0.0072,0.0079,0.0092,0.0097},{0.,6.3,16.8,25.1,43.6,53.2,57.4,71.8,81.5,94.1,97.4}}
Unfortunately, this groups currents first, then voltages, but we want them to appear in corresponding pairs. This can be easily accomplished by the Transpose function:
dataCombined=Transpose[{dataCurrent,dataVoltage}]
(* Or equivalently using tr, which converts to , which is less bulky *)
dataCombined={dataCurrent,dataVoltage}
{{0.,0.},{0.0012,6.3},{0.0019,16.8},{0.0034,25.1},{0.0036,43.6},{0.005,53.2},{0.0062,57.4},{0.0072,71.8},{0.0079,81.5},{0.0092,94.1},{0.0097,97.4}}
{{0.,0.},{0.0012,6.3},{0.0019,16.8},{0.0034,25.1},{0.0036,43.6},{0.005,53.2},{0.0062,57.4},{0.0072,71.8},{0.0079,81.5},{0.0092,94.1},{0.0097,97.4}}
On a side note: If Transpose complains that there lists cannot be transposed because they are unequal in length, well, they are. Check whether you've imported your data correctly, i.e. whether you've left out a value by accident or something like that.
Usually, you won't input data by hand, but the method presented here translates to long lists of data from say a textfile directly: read single data lists, combine and then transpose them.
Data generation
We could start using the data I've entered above right now, but I think it could also be interesting to be able to change the datasets by hand. For this reason, the code below generates data just like the one previously entered, but with unique scattering and such. Qualitatively, the lists will be identical.
However, if you change the SeedRandom argument, you can generate custom datasets. (You can also delete the function out to get new random numbers every time.)
SeedRandom[10]
dataCurrent=10^-3 Table[k+1/10 RandomVariate[NormalDistribution[]],{k,0,10,1}];
dataCurrent[[1]]=0;
dataVoltage=Table[k+k/24 RandomVariate[NormalDistribution[]],{k,0,100,10}];
dataVoltage[[1]]=0;
dataCombined={dataCurrent,dataVoltage};
The first regression
It's usually a good idea to look at the data to check whether it's any good before actually working with it. For example, you might have mixed up your voltage and temperature files; in that case, the data sure won't have a linear dependency. Simply use ListPlot on the combined data:
dataPlot=ListPlot[dataCombined,PlotStyle->{PointSize[0.01],ColorData[1][2]}];
Show[dataPlot,AxesOrigin->{0,0},AxesLabel->{"I [A]","U [V]"}]


Alright, that's linear. But how linear? And what are the errors? Time to bring in the actual tool of the trade, LinearModelFit.
fit=LinearModelFit[dataCombined,x,x]
FittedModel[1.21269 +9766.47 x]
The variable fit now contains a whole lot of information. Two notes: a) Make sure x wasn't assigned a value before, as 1.2442 doesn't make a good variable. To unset it, just evaluate x=. before using it as a fit variable; b) I've forgotten LinearModelFit[data] so often (without the x,x), yet I still make the mistake. It will not work like that, so if you're running into some weird errors, check whether you've provided three arguments, not only one.
Let's walk through the most important information you can extract from that FittedModel object.
fit can be used as a ordinary function of a single variable. The result will simply be the value of the regression at that x value.
fit[0.0018]
18.7923
This can, for example, be used to plot the regression function:
fitPlot=Plot[fit[x],{x,0,0.01},PlotStyle->ColorData[1][1]];
Show[dataPlot,fitPlot,AxesOrigin->{0,0},AxesLabel->{"I [A]","U [V]"}]


You can ask for the fit parameters the following way:
fit["ParameterTable"]
Estimate Standard Error t-Statistic P-Value
1 1.21269 1.15969 1.04571 0.322968
x 9766.47 195.144 50.0474 2.54721*10^-12




The first line is the offset, the second one the slope of the linear regression, along with standard errors (σ) and P-value ("how probable is getting this result from random data").
Actually getting the regression data in a form you can use it is done similar to the parameter table:
fitParameters=fit["ParameterTableEntries"]
{{1.21269,1.15969,1.04571,0.322968},{9766.47,195.144,50.0474,2.54721*10^-12}}
Oh, now that looks convenient ... not. However, if you take a closer look, you'll see that the first sub-list has the same values as the first line in the table above. Gladly, this isn't a coincidence; ParameterTableEntries basically prints the ParameterTable in non-prettyprinted list form, which can then actually be used in further calculations. For example, the slope of the graph corrsponds to resistivity, so let's extract that value. (If you're not familiar with "[[]]" have a look at the Mathematica help page on Part)
r=fitParameters[[2,1]];
σr=fitParameters[[2,2]];
(* Prettyprint it *)
Row[{"Resistivity: ",r," ± ",σr," Ω"}]
Resistivity: 9766.47 ± 195.144 Ω
Weighted regression
When it comes to data points with different errors, it's not a good idea only to use their measured values for the regression. In the example above, the measurement for zero current was zero voltage, which should obviously be the case. However, the regression has a linear offset, i.e. does not pass through (0, 0). It would be a good idea to impose this as a condition to the linear regression; this can be done using the Weights attribute. This attribute either takes a list or a function; let's start with the function version.
Weight function
fitWeighted1=LinearModelFit[dataCombined,x,x,Weights->(If[#==0,10^24,1]&) ]
FittedModel[3.87956*10^-24+9938.59 x]
Soo, what does this "Weights" thing above do? Well, for any point of the data set, the function supplied is being called. The function specified checks whether the current data point ("#") is equal to zero. If that is the case, it is assigned weight 10^12 (unfortunately, ∞ is not supported as weight, but 10^12 should be good enough for our purposes), otherwise 1. Let's check the ParameterTable again.
fitWeighted1["ParameterTable"]
Estimate Standard Error t-Statistic P-Value
1 3.87956*10^-24 2.18798*10^-12 1.77312*10^-12 1
x 9938.59 111.01 89.5287 1.37157*10^-14

Tadaa, the offset is essentially zero. (Ignore the awful relative error and value here.)
Here's the plot again, with the old regression drawn dashed:
fitUnweightedPlot=Plot[fit[x],{x,0,0.01},PlotStyle->{Dashed,ColorData[1][1]}];
fitWeighted1Plot=Plot[fitWeighted1[x],{x,0,0.01},PlotStyle->ColorData[1][1]];
Show[
dataPlot,
fitUnweightedPlot,
fitWeighted1Plot,
AxesOrigin->{0,0},AxesLabel->{"I [A]","U [V]"}
]


Weight list: Error weighted regression
Usually, the weight function won't be available as a mathematical expression, but as a list of errors. The data used so far was scattered pretty horribly for a simple U/I measurement, but let's make things even worse and assume the volt meter has an error of 10 % + 5 from 0 to 25 V, 5 % + 5 from 25 to 50 V, and 5 % + 7.5 from 50 to 100  V.
calculateError=v 0 v==0
0.1 v+5 v<25
0.05 v+5 v<50
0.05 v+7.5 True

σdataVoltage=calculateError/@dataVoltage
{0,6.02306,6.99214,6.52972,7.09684,7.37975,10.6136,11.1499,11.5614,11.9535,12.2998}
(The bracket used is entered using pw for Piecewise, the function arrow is shorthand notation for Function and entered fn, and /@ is the short version of Map. As usual, refer to the excellent Mathematica help function in case you don't understand certain commands.)
I won't get into doing plots with error bars here, so ignore the code and have a look at the resulting plot only.
errorPlotData={{dataCurrent,dataVoltage},ErrorBar/@σdataVoltage};
dataErrorPlot=ErrorListPlot[errorPlotData,PlotStyle->{PointSize[0.01],ColorData[1][2]}];
Show[
dataErrorPlot,
AxesOrigin->{0,0},AxesLabel->{"I [A]","U [V]"}
]


Alright then, time for the regression. A common choice for the weight is the inverse square of the error of the corresponding data point. On the parameter VarianceEstimatorFunction: it determines the impact of the single weights on the general result. Mathematica uses a more complicated model by default, and 1& is the model that reproduces the "draw a line with minimum/maximum slope" behavior. For now, I strongly suggest you change VarianceEstimatorFunction to 1& if and only if you want to have a classical error weighted regression.
fitWeighted2=LinearModelFit[dataCombined,x,x,Weights->1/σdataVoltage^2,VarianceEstimatorFunction->(1&)];
Power::infy: Infinite expression 1/0 encountered. >>
LinearModelFit::wts: The value of option Weights -> {ComplexInfinity,0.0275655,0.0204541,0.0234536,0.019855,0.0183619,0.00887714,0.00804372,0.00748133,0.00699862,0.00661} should be a list of real numbers or a pure function. >>
Damn, division by zero. The problem with infinte weight again. Let's use the same solution as above: Make the weight veeeery huge, but not infinite. The zero here is the error of the point (0, 0), i.e. the first entry of the σdataVoltage list.
σdataVoltage[[1]]=10^-24;
That should be small enough. Now let's retry the fit.
fitWeighted2=LinearModelFit[dataCombined,x,x,Weights->1/σdataVoltage^2,VarianceEstimatorFunction->(1&)];
Way better. Now check the result:
fitWeighted2["ParameterTable"]
fitParameters=fitWeighted2["ParameterTableEntries"];
rErrorWeighted=fitParameters[[2,1]];
σrErrorWeighted=fitParameters[[2,2]];
Row[{"Resistivity: ",rErrorWeighted," ± ",σrErrorWeighted," Ω"}]
fitWeighted2Plot=Plot[fitWeighted2[x],{x,0,0.01},PlotStyle->ColorData[1][1]];

Show[
dataErrorPlot,
fitWeighted2Plot,
AxesOrigin->{0,0},AxesLabel->{"I [A]","U [V]"}]
Estimate Standard Error t-Statistic P-Value
1 6.1192*10^-48 1.*10^-24 6.1192*10^-24 1
x 9956.15 530.777 18.7577 1.59727*10^-8
Resistivity: 9956.15 ± 530.777 Ω
Confidence bands
Fitting in Mathematica is far more powerful than just finding the actual fit parameters. One particularly nice feature is calculating confidence bands. Roughly speaking, the 90 % confidence band marks the region around the best fit so there's a 90 % chance that the correct fit lies inside that region. Although the confidence bands appear to be straight lines here, the underlying function is a bit more complicated (nonlinear!).
confidenceLevel=.68;
confidenceBands[x_,c_]:=fitWeighted2["MeanPredictionBands",ConfidenceLevel->c]
cBStyle=Directive[{Opacity[1/6],ColorData[1][1]}];
confidencePlot=Plot[confidenceBands[x,confidenceLevel],{x,0,0.01},Filling->{1->{2}},Evaluated->True,PlotStyle->{{Opacity[1/2],ColorData[1][1]},{Opacity[1/2],ColorData[1][1]}},FillingStyle->{Opacity[1/6],ColorData[1][1]}];
Show[
dataErrorPlot,
fitWeighted2Plot,
confidencePlot,
AxesOrigin->{0,0},AxesLabel->{"I [A]","U [V]"}]
This concludes this basic regression tutorial; you should be able to do basic linear regression analysis now. I suggest you read the help pages on LinearModelFit, which provides some more examples; in addition, there' s a full list of all the special arguments that can be provided to a fit, because ParameterTable was just one of a loooot of them. In the next chapter we'll have a look NonLinearModelFit, which is basically a LinearModelFit you can tell what function to use for fitting.
Nonlinear fitting
Noisy Gauss curve
Data generation
For this part, we'll be looking at a Gauss curve with random parameters, and additionally a significant amount of noise overlay. Again, to create new datasets you can either delete the SeedRandom[] function altogether or change its argument to have reproducible random numbers.
(Note: In this example, the automatic fit might fail horribly. If that is the case, simply generate a new data set and hope it's more suitable to my default parameters assumed somewhere below.)
SeedRandom[27]
(* {μ, a, σ, y0} *)
dataParameters={RandomReal[{-π,π}],RandomReal[{1/2,2}],RandomReal[{1/2,3}],RandomReal[{1/2,3}]};
Block[{x,μ=dataParameters[[1]],a=dataParameters[[2]],σ=dataParameters[[3]],y0=dataParameters[[4]],f,fNoise},
dataX=Range[-10,10,1/10];
f={x,y0}y0+a/Sqrt[2 π σ^2] Exp[-((x-μ)^2/(2 σ^2))];
fNoise={x,y0}f[x,y0]+1/48 RandomVariate[NormalDistribution[0,a]];
dataY=(xfNoise[x,y0])/@dataX;]
dataPlot=ListPlot[{dataX,dataY},PlotStyle->{ColorData[1][2],PointSize[0.005]},PlotRange->{.55,1.05}]
Fitting
Fitting in the nonlinear case is pretty much the same approach as in the linear case, only that you have to take care about a few extra things.
Obviously, you have to provide the fit function yourself. In our case, this is a Gauß curve of the form Subscript[y, 0]+a /Sqrt[2 π σ^2] exp(-((x-μ)^2/(2 σ^2))).
No matter what, the same dataset will always yield the same linear regression. This is not the case for nonlinear models, since this is an iterative procedure instead of a straightforward formula. You will have to provide starting values for your fit or Mathematica will give you astronomically wrong parameters; these starting values should be at least in the order of magnitude of what you're expecting.
You can do most of the things possible in LinearModelFit with NonlinearModelFit as well, for example displaying the parameter table, assigning weights etc.
The fit function in our case has the following form:
NonlinearModelFit[data, modelFunction[x, Subscript[a, 1], Subscript[a, 2], ...], {{Subscript[a, 1], Subscript[a, 1,Start]}, {Subscript[a, 2], Subscript[a, 2,Start]}, ...}, x]
data is the data in the usual form of x/y pairs, modelFunction is a function of the function variable x and the parameters Subscript[a, i]. Afterwards, these parameters are specified along with their initial values, and then finally the function variable explicitly.
So let's do the fit and have a look at the result.
fit=Block[{y0,a,σ,μ},
NonlinearModelFit[
{dataX,dataY},
{y0+a/Sqrt[2 π σ^2] Exp[-((x-μ)^2/(2 σ^2))],σ>0},
{{a,1},{σ,1},{μ,0},{y0,0}},
x
]
];
fit["ParameterTable"]//Quiet
fitPlot=Plot[fit[x],{x,-10,10}];
Show[dataPlot,fitPlot,PlotRange->{.55,1.05}]
Estimate Standard Error t-Statistic P-Value
a 1.11637 0.0255077 43.7662 1.94121*10^-103
σ 1.34598 0.0310114 43.4026 8.61198*10^-103
μ 0.205065 0.0284962 7.19622 1.26815*10^-11
y0 0.63899 0.0021267 300.461 3.48367*10^-264
Looks pretty good, but how accurate were the fit parameters chosen with respect to the parameters used to actually generate the data?
pte=fit["ParameterTableEntries"]//Quiet;
round=xRound[Abs[x],0.001];
tableData={
{"","Fit","Generation","Relative error"},
{"a",pte[[1,1]]±pte[[1,2]],dataParameters[[2]],round[1-dataParameters[[2]]/pte[[1,1]]]},
{"σ",pte[[2,1]]±pte[[2,2]],dataParameters[[3]],round[1-dataParameters[[3]]/pte[[2,1]]]},
{"μ",pte[[3,1]]±pte[[3,2]],dataParameters[[1]],round[1-dataParameters[[1]]/pte[[3,1]]]},
{"Subscript[y, 0]",pte[[4,1]]±pte[[4,2]],dataParameters[[4]],round[1-dataParameters[[4]]/pte[[4,1]]]}
};
Grid[tableData,Dividers->{{2->True},{2->True}}]
Fit Generation Relative error
a 1.11637±0.0255077 1.13252 0.014
σ 1.34598±0.0310114 1.37806 0.024
μ 0.205065±0.0284962 0.173358 0.155
Subscript[y, 0] 0.63899±0.0021267 0.635896 0.005
Confidence revisited
The confidence bands we've drawn on the resistivity graph in the chapter before can be used on nonlinear models as well.
confidenceLevel=1-10^-9;
confidenceBands[x_]=Quiet[fit["MeanPredictionBands",ConfidenceLevel->confidenceLevel]];
cBStyle=Directive[{Opacity[1/6],ColorData[1][1]}];
confidencePlot=Plot[confidenceBands[x],{x,-10,10},Filling->{1->{2}},FillingStyle->{Opacity[1/6],ColorData[1][1]},Evaluated->True,
PlotStyle->{{Opacity[2/3],ColorData[1][1]},{Opacity[2/3],ColorData[1][1]}}
];
Show[
fitPlot,
confidencePlot,
dataPlot,
PlotRange->{.55,1.05}]
A real world example
The following data was taken for an advanced lab course experiment on nuclear magnetic resonance. The x values correspond to time, the y values represent the signal. I won't go into detail what this data represents exactly here, all we need to know is that the big slope is assumed to be error function shaped.
Assembling the data
(* Opening the notebook dir should really get a shorter command *)
dataMeasured=Import[NotebookDirectory[]<>"dataMeasured.csv"];
(* Use diluted data for large datasets. ListPlot will plot all points, and you really don't want to display a million dots and then try resizing your graphics. *)
dilutedListPlot[list_List]:=ListPlot[list[[1;;-1;;Ceiling[Length[list]/256]]]]
dilutedListPlot[dataMeasured]
The part we're interested in is the large slope, so we'll have to extract that part and remove the unwanted regions. This can easily be done by a right click into the plot, and then choosing "Get Coordinates". I've pasted my result below, and then extracted that region from the original dataset.
region={{3.338,-0.0322},{11.57,-0.5766}};
data=Select[dataMeasured,#[[1]]>=Min[region[[1]]]∧#[[1]]<=Max[region[[1]]]&];
Grid[{
{"Data points","before",dataMeasured//Length},
{"","after",data//Length}
}]
dilutedListPlot[data]
Data points before 2500
after 824
Fitting
As previously stated, the data is assumed to follow an error function in this region. The first thing to do after you've decided what function to use is how the start parameters should look like.
The total amplitude is around 0.6, and in the opposite direction of the error function. Hence -0.6 sounds reasonable.
The slope has a width of around one unit (here: seconds), so 0.5 for σ will do.
The center of the slope is around (μ,Subscript[y, 0])=(5.5, -0.3).
fit=NonlinearModelFit[data,a/2 Erf[(x-μ)/(Sqrt[2]σ)]+y0,{{a,-0.6},{σ,0.5},{μ,5.5},{y0,-0.3}},x];
fit["ParameterTable"]
Estimate Standard Error t-Statistic P-Value
a -0.539004 0.000904544 -595.885 1.162573532156*10^-1083
σ 0.178631 0.00232465 76.8421 6.17299370652*10^-377
μ 5.61703 0.00167277 3357.91 5.23081946807*10^-1699
y0 -0.303107 0.000447099 -677.941 1.637483164089*10^-1129
Presentation
Time to look at the actual result, presented so that one could actually print it.
dataPlot=ListPlot[data,PlotStyle->{PointSize[0.0025],ColorData[1][2]}];
fitPlot=Plot[
fit[x],
{x,Min[data[[All,1]]],Max[data[[All,1]]]},
PlotStyle->{Thickness[0.0025],Opacity[0.8]},Evaluated->True];
confidenceLevel=1-10^-15;
confidenceBands[x_,c_]:=fit["MeanPredictionBands",ConfidenceLevel->c]
cBStyle=Directive[{Opacity[1/6],ColorData[1][1]}];
confidencePlot=Plot[confidenceBands[x,confidenceLevel],{x,Min[data[[All,1]]],Max[data[[All,1]]]},Filling->{1->{2}},Evaluated->True,PlotStyle->{{Opacity[1/3],ColorData[1][1]},{Opacity[1/3],ColorData[1][1]}},FillingStyle->{Opacity[1/8],ColorData[1][1]}];
pte=fit["ParameterTableEntries"];
Show[
dataPlot,
fitPlot,
confidencePlot,
Graphics[
Riffle[
{Dashed,Dotted,Dotted},
Line[{{#,0},{#,Min[data[[All,2]]]}}]&/@{pte[[3,1]],pte[[3,1]]+pte[[2,1]],pte[[3,1]]-pte[[2,1]]}]],
Graphics[Style[#,18]&@Text[μ±σ,{pte[[3,1]],0.025}]],
AxesLabel->{"t [s]","S [arb. u.]"},
ImageSize->800]
Conclusion
Fitting is a very powerful feature and can do very impressive things, but just reading about it won't be enough. It cannot be stated often enough: This was an introduction only. Read the correponding help files, then write code yourself. I assure you this is the best way to learn Mathematica, if not the only one.
Initialization
Don't modify this or the notebook won't run. 
Needs["ErrorBarPlots`"]
SetOptions[#,PlotRange->All,ImageSize->600,LabelStyle->16]&/@{Plot,ListPlot,ListLinePlot};

猜你喜欢

转载自blog.csdn.net/weixin_42145149/article/details/80211244
今日推荐