![]() |
Experiments in photometry
1 Attachment(s)
Tom provided me with some images of the Pleaides. Pretty pictures derived from that data have been posted in another thread. Since then I've been learning how to do photometry.
First off, all the images with reasonably round stars were stacked together to increase the SNR. Then the DAOPHOT package in IRAF was used to find the stars in the DSLR green channel and perform aperture photometry --- an interesting learning exercise in its own right. This produced a list of around 20K "stars", some of which are doubtless peaks in the sky noise. Some genuine stars were missed because their images were saturated. Star coordinates are in pixels and magnitudes have an arbitrary zero offset. A search of the NOMAD1 catalogue for RA,Dec coordinates and BVR magnitudes yielded 5429 entries when constrained to a 2-degree square with all magnitudes brighter than 17.0. IRAF's wcsctrans utility converted the catalogue's coordinates to pixels and then xyxymatch produced a list of correspondences between the two files. From this data I made a file with measured magnitudes (which I call M) matched with the corresponding B, V and R magnitudes --- 1239 of them. Then the fun began. Some matches were clearly coincidences --- faint NOMAD1 stars matched with noise. Some were double identifications: two close NOMAD1 stars unresolved in Tom's images. To see what was going on the file was loaded into R (the data-analysis package, not the photometry band!) and various least squares fits performed. This was in itself a learning exercise because I really don't know how to drive R. As expected, the faintest matches showed enormous residuals so I slung out everything fainter than V=14 and refitted. The initial fits showed what background reading has lead me to expect: the DSLR green channel is very well correlated with the V band but less so with B and R. Good. It also showed a number of outliers scattered pretty much at random, presumably due to false matches to faint stars or unresolved doubles, and quite a lot of scatter at the bright end, where it's much more difficult to measure the star images. Removing those yielded the following results. [code]lm(formula = M ~ V, data = df) Residuals: Min 1Q Median 3Q Max -0.66014 -0.15829 -0.02395 0.13325 0.88001 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.560642 0.086998 40.93 <2e-16 *** V 1.000470 0.007241 138.17 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2242 on 450 degrees of freedom Multiple R-squared: 0.977, Adjusted R-squared: 0.9769 F-statistic: 1.909e+04 on 1 and 450 DF, p-value: < 2.2e-16 [/code] Plotting the data suggested a quadratic might give better results (I have no theoretical justification). This happened: [code]lm(formula = M ~ V + V2, data = df) Residuals: Min 1Q Median 3Q Max -0.58434 -0.14479 -0.01454 0.10051 0.85987 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.093384 0.433187 16.375 < 2e-16 *** V 0.337277 0.080168 4.207 3.12e-05 *** V2 0.030318 0.003652 8.302 1.21e-15 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.209 on 449 degrees of freedom Multiple R-squared: 0.98, Adjusted R-squared: 0.9799 F-statistic: 1.102e+04 on 2 and 449 DF, p-value: < 2.2e-16 [/code]The fit is not noticeably better. A plot of the residuals from the linear is attached. It's broadly as I expected, with more scatter at the faint end, but the residuals are disappointingly large and there's an obvious correlation up to V=11 or so. Not sure how to proceed from here but I suspect that paying closer attention to DAOPHOT would be a good start. Alternatively, perhaps the data just isn't good enough for photometry below V=11. All this series of harangues have been documenting my attempt to educate myself. I think I'm learning, slowly. |
One way of combatting the heteroscedasticity you are seeing is log transforming the dependent variable. Once the result is back transformed you should be able to predict M better with V, if that is your aim.
It looks like you may be calculating the residuals by hand. A shortcut is the residuals function. Plotting the linear model object produces some plots that might help identify if there are points that are huge outliers skewing your linear model. I would imagine that some of the residuals are from misclassified stars for example. It does look like there is something that significantly changes when V reaches ~11. Would the sensor lose accuracy once the "correct" V goes above 11? Assuming that the camera was perfect you should expect a fit of 0+1*V. Given that the slope of very close to this I am surprised by the apparent slope less than 11. I suspect that there are two separate things going on messing up this fit. Any modelling will only end up correcting errors since we know that the slope should be 1. |
[QUOTE=henryzz;457110]One way of combatting the heteroscedasticity you are seeing is log transforming the dependent variable. Once the result is back transformed you should be able to predict M better with V, if that is your aim.
It looks like you may be calculating the residuals by hand. A shortcut is the residuals function. Plotting the linear model object produces some plots that might help identify if there are points that are huge outliers skewing your linear model. I would imagine that some of the residuals are from misclassified stars for example. It does look like there is something that significantly changes when V reaches ~11. Would the sensor lose accuracy once the "correct" V goes above 11? Assuming that the camera was perfect you should expect a fit of 0+1*V. Given that the slope of very close to this I am surprised by the apparent slope less than 11. I suspect that there are two separate things going on messing up this fit. Any modelling will only end up correcting errors since we know that the slope should be 1.[/QUOTE]Ididn't quote everything. In practice, I ran foo <- lm (...) and followed up with residuals(foo) and plot(foo). Those were used iteratively to remove outliers and suggest where the limiting magnitude should be. Not sure why you are surprised. The fit is [code]V 1.000470 0.007241 138.17 <2e-16 ***[/code] which states the slope differs from unity by 1/16 the standard error in the estimate for the slope. Seems pretty convincing to me. Anyway, the correlation need not be perfect because the absorption curves of the standard V filter and the Nikon G filter are quite different so if there is a systematic trend in the stellar population (fainter stars tend to have a more prominent absorption line for example) the correlation would be expected to be poorer. The fit should [b]not[/b] have an intercept of zero because the photometry was with respect to an arbitrary zero magnitude. Without precise knowledge of the sky brightness in terms of photons per square arcsec per second, an equally precise knowledge of the properties of the optics and detectors, and knowledge of the air mass between the star and the detector, it's impossible to perform absolute photometry. Differential photometry by its very nature typically has a non-zero constant offset. At the faint end there will inevitably be a much greater scatter because the estimated star photons become an ever smaller fraction of the estimated sky photons and of the intrinsic noise in the detector. Why there appears to be a breakdown in linearity escapes me at the moment. It may well arise from my inexperience at performing the initial aperture photometry. That's something I'm going to explore further. Thanks for the tip about log transformation. Something else to explore. I can see that you know vastly more about R and data modelling than I. Consider yourself on notice for further questioning. :wink: |
1 Attachment(s)
[QUOTE=henryzz;457110]
It does look like there is something that significantly changes when V reaches ~11. Would the sensor lose accuracy once the "correct" V goes above 11? Assuming that the camera was perfect you should expect a fit of 0+1*V. Given that the slope of very close to this I am surprised by the apparent slope less than 11. I suspect that there are two separate things going on messing up this fit. Any modelling will only end up correcting errors since we know that the slope should be 1.[/QUOTE]Ah, now I understand why you are surprised. The fit for V<11 is summarized as [code]summary (foo) Call: lm(formula = M ~ V, data = df) Residuals: Min 1Q Median 3Q Max -0.16135 -0.04791 -0.01078 0.04423 0.18402 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.803201 0.085455 56.21 <2e-16 *** V 0.868028 0.009436 92.00 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.07683 on 60 degrees of freedom Multiple R-squared: 0.993, Adjusted R-squared: 0.9928 F-statistic: 8463 on 1 and 60 DF, p-value: < 2.2e-16[/code] and the slope is significantly different from 1. Other than handwaving about different spectral pass bands I have no explanation. The residual plot is attached. It looks fine to my untutored eye. |
[QUOTE=xilman;457115]Ididn't quote everything. In practice, I ran foo <- lm (...) and followed up with residuals(foo) and plot(foo). Those were used iteratively to remove outliers and suggest where the limiting magnitude should be.
[/quote] It wasn't clear how much you had done/knew about R. Care is needed removing outliers. Any fit will be good if all the outliers are removed. [quote] I can see that you know vastly more about R and data modelling than I. Consider yourself on notice for further questioning. :wink:[/QUOTE] It's my job. :smile: |
[QUOTE=xilman;457116]
The residual plot is attached. It looks fine to my untutored eye.[/QUOTE] Depends on how fussy you are. Ideally the lowess fit would be a straight line. There is still a bit much of a pattern for me. I don't think we are going to win with this data. I suspect that a piecewise model is needed. Even then whenever we divide the dataset there is still some appearance of a pattern. I would not like to get this data from a medical trial. |
[QUOTE=henryzz;457133]I suspect that a piecewise model is needed. Even then whenever we divide the dataset there is still some appearance of a pattern.[/QUOTE]
Just putting this out there for consideration... [URL="http://opencv.org/"]OpenCV[/URL] was initially underwritten by Intel, and is still being supported. Codifying known and published algorithms. The libraries are licence under the BSD licence. Could this possibly help? |
[QUOTE=henryzz;457133]Depends on how fussy you are. Ideally the lowess fit would be a straight line. There is still a bit much of a pattern for me.
I don't think we are going to win with this data. I suspect that a piecewise model is needed. Even then whenever we divide the dataset there is still some appearance of a pattern. I would not like to get this data from a medical trial.[/QUOTE]Quite often in astronomy you have to take the data you are given. That's often the case in medicine too but you can run more trials to examine a long-lived phenomenon in either science. Short-lived phenomena, such as a cometary collision with Jupiter or the teratogenic effects of Thalidomide are more constrained. General philosophy aside, I can make the data available for others to analyze, though I suspect not many here will want half a gig of NEF-format images. More likely is the table of matched NOMAD-1 values and DAOPHOT-reduced observational magnitudes and their estimated errors. Given a day or so I could also download neighbouring areas of NOMAD-1 data and match them with the observations. This effort should produce another few thousand entries. |
[QUOTE=henryzz;457110]It does look like there is something that significantly changes when V reaches ~11. Would the sensor lose accuracy once the "correct" V goes above 11?[/QUOTE]Another thought occurred overnight. Some of the stars are almost certain to be variables. If their brightness at the time of observation was significantly different from their catalogued value the residuals would be larger. There are (at least) two issues which could bias the selection. Those variables which are too faint to reach the cut-off would be removed from consideration, despite reaching the NOMAD-1 criterion. Another is that I do not know, yet, whether NOMAD-1 records maximum, minimum or some kind of weighted mean magnitudes for variables. Whichever is chosen there is an opportunity for bias.
|
4 Attachment(s)
OK, here's the summary of the raw fit to four times as many catalogue entries. Raw in that no attempt has been made to remove outliers or data with magnitudes way fainter than can be plausibly justified.
To be clear, what I'm interested in is finding V given a value of M measured by aperture photometry on a stacked set of images. [code]> summary (foo) Call: lm(formula = V ~ M, data = df) Residuals: Min 1Q Median 3Q Max -6.0316 -0.5887 -0.4071 -0.0404 6.9252 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.06930 0.24615 -8.407 <2e-16 *** M 0.93904 0.01507 62.326 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.151 on 2886 degrees of freedom Multiple R-squared: 0.5737, Adjusted R-squared: 0.5736 F-statistic: 3884 on 1 and 2886 DF, p-value: < 2.2e-16 [/code] All four plots this time because the others may mean more to some people (specifically henryzz) than they do to me and might tell us something important. |
ISTM from fit1.jpg that if I arbitrary remove all data with abs(residual) greater than, say, 0.1 the remainder will enable me to predict V from M as required. I've absolutely [b]no[/b] theoretical justification for making that decision. Any ideas and analysis are very welcome.
|
| All times are UTC. The time now is 02:01. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.