D0 SingleTevents:

Evidence for single top by D0 December 12, 2006, posted bydorigo at dorigo.wordpress.com/

Ok, on this one, D0 appears to beat us. 

I recently discussed the complex situation of single topproduction searches in CDF inhttp://dorigo.wordpress.com/2006/11/07/the-elusive-single-top/ (and seehttp://dorigo.wordpress.com/2006/11/20/a-low-mass-top-in-single-top-events/for an exotic interpretation of those results). To summarizehere, despite a huge effort by CDF, no clear indication of thesignal is present in the data so far analyzed - one analysisfinds a 2.6-sigma excess over backgrounds, but another study based onthe same data sees no signal at all; and the 2.something-sigma effectarose suspicion in some that there be something unexpected in thedata.

Things in D0 are brighter: they gave a wine and cheese seminar atFermilab four days ago, when Dugan O'Neil showed the results ofthree different analysis methods, all consistently showing a clearevidence for a Standard Model signal of single top production. Youcan find Dugan's slides athttp://www-d0.fnal.gov/Run2Physics/WWW/results/prelim/TOP/T39/wine_and_cheese.pdf.

The a-priori best measurement of the set provides a cross sectionof 4.9+-1.4 picobarns, when 2.9+-0.3 pb is the next-to leading ordertheory prediction. This cannot be dubbed an "observation" yet (whichis a word reserved for 5-sigma effects in physics jargon), but itcomes close to it, and it would be very strange if the 3.6-sigmaexcess of D0 did not grow to observation level in the next fewmonths, as more data will be fed to the analysis.

What's more, D0's excess appears to cluster at the right top massvalue, and not at low mass as some of CDF's - an indication thatthings are going fine and that the standard model still rulez.Indeed, single top production is a purely electroweak process - atleast in one of the two production channels - and surprises therewould be twice as much puzzling. You can see some of the D0 signal inthe plot ...

... where the reconstructed top quark mass is plot for the dataand compared to backgrounds (in grey, green, and red) and signal (theblue stuff).

If you are curious about the details of the analyses by D0, Iencourage you to have a look at the slides linked above. If you arelazy, I offer below a poor-man description of thewhole thing…

Events triggered by the presence of a high-momentum electron ormuon are collected in a 0.9 inverse-femtobarn dataset, significantmissing transverse energy is required, and two or more jets. Aneural-network B-tagger finds very effectively jets which are likelyto have originated by b-quark hadronization, thus enriching the datawith the single top production signal, which should nominally yield aW (yielding the lepton and missing Et) and two b-quarkjets (plus an additional light-quark jet in some cases). Thedata is then studied by a decision tree, which uses many kinematicalvariables to discriminate the signal from all known backgroundsources. A cut on the decision tree output enriches the survivingdata of signal to the level that an excess is observed. The mostdiscriminating kinematical variables (such as the one shown in theplot above) are then studied to verify that the excess sits where itis expected from single top production. From the excess a crosssection is computed by taking into account the amount of dataanalyzed and the selection efficiencies.

Still curious ? Go to the talk!

Date: Wed, 13 Dec 2006 03:17:27 +0100 (CET)From: Tommaso Dorigo 

(TD reply text in normal type)

To: Tony Smith(TS original message text in preformatted type)Subject: Re: D0 singleT

Hi Tony,

as usual, you're welcome, and as usual my answers have a fairchance of being only partial answers to your questions. However Iwill try to do my homework.

> It may be that my questions are too naive to be useful> because I don't have much intuition about what DT means physically,> so please feel free to tell me if that is the case,> and in that case just ignore the questions asked below in this message.> On the other hand, if you think that the questions might be useful,> feel free to post this message including images on your blog entry.

I don't know D0's decision tree well enough myself, but I know thelist of variables which are fed in the trees, from slide 24.

There, you can see that they put in the "best" top mass as akinematical selection variable. Moreover, many other variables whichare directly correlated with the top mass itself are fed into the DT.It is a perfectly legal thing to do, but once you do it, you have tobe careful to interpret the results. In particular, a single topproduction process with a mass different from that with which youbuilt your trees (your "signal") will be treated as a background ifthe mass difference is large enough to make the branches splitregular top and low-mass top differently.

What would tell us if that is really the case would be therelative weight that the final trees give to each of the variables.If the top mass is one of the vars which is given most weight indetermining how to classify the event, then any top signal with masssignificantly different from 175 GeV would be washed out.

Be careful here, "weight" is not a very well defined quantityhere. Some decision tree algorithms have a built-in way to determinea posteriori (i.e. when the trees are built) what weight did avariable have in selecting signal from backgrounds. Others don't. Iwould not be surprised if, by asking D0 what weight does the "best"top mass have in their DT, you got a perplexed look in return, orworse, a layman explaination that the DT is not a neural network. Butthey might also answer with a number straight away :)

In any event, I have the answer myself. If you look at the plots,they speak to you. The three distributions at DT<0.3,


and >0.65

are VERY different in the "best" top mass. AND, the high-DT datahave a perfectly coincident distribution for all backgrounds and forsingle top. THat is to say, that variable has been totally "squeezed"for its discriminating power by the classifier. In other words, whatone can gather from that plot is only the relative normalization ofthe expected contributions to the data points, since shapes will becoincident. A point of relevance: the relative normalization of thevarious colors

tells you indeed that the high-DT data favors the SM single topwith respect to backgrounds, as it should. But it does so based onthe top mass itself, and therefore that variable is no longer a verygood one to display the final result! In fact, one would prefer tokeep the most discriminant variable aside, and train one's classifierwith the others, being careful to avoid variables that are correlatedwith the most powerful one: that way, one would retain discriminatingpower in the best variable _after_ a cut on the classifier's output.That is the strategy adopted for higgs searches at low mass in CDF,where the higgs mass is left aside, being very discriminant byitself.

 So, to summarize:


> Attached image D0TqDTs.jpg shows Decision Tree output that > seems to me to be shown in more detail in the images from slide 47.>> Looking at the attached image D0TqDTlt3.jpg showing Tquark mass> for DT less than 0.3> it seems to me that the high data points for singleT events> are in the bins for 100-125 GeV and 150-175 GeV.> However,> I guess that low DT might mean that not many singleT events> are expected, because the low DT histogram shows very little of the blue> or cyan colors that correspond to expectation of singleT events,> so> maybe the low DT data is not very significant ?

Not necessarily. Low DT means low probability of a 175 GeV top,given a lot of final state quadrimomenta. So a lower mass top quarkmight get a low grade and end up there. By the way, have you noticedthe tell-tale dip at 175 of the W+jets background ? That is the signthat events with that mass are preferentially high-DT ones, if thereare no more striking characteristics telling them apart from theSingle top hypothesis - for instance, ttbar does not get such a voidat 175 because there are more useful variables to discriminate itfrom single top, and it clusters at 175 anyway...

>> Looking at the attached image D0TqDTgt5.jpg showing Tquark mass> for DT greater than 0.55> it seems to me that the high data points for singleT events> are in the bins for 175-200 GeV and 225-250 GeV,> and> that the data point for the 150-175 GeV bin is a bit low.

All good - but we are discussing very insignificant flukes here.The error bars are generally larger than any discrepancy...

> Looking at the attached image D0TqDTgt6.jpg showing Tquark mass> for DT greater than 0.65, which I think is the image on your blog entry,> it seems to me that the high data points for singleT events> are in the bins for 125-150 GeV and 175-200 GeV and 225-250 GeV,> with the 175-200 GeV bin having the highest data point,> and> that the data points for the 150-175 GeV and 200-225 GeV bins are low.


> Since the DT greater than 0.65 histogram shows the largest> amount of blue and cyan singleT contributions, I guess that> it might be considered the most physically significant histogram> with respect to seeing singleT events.

Again, not so given what I wrote above.

> It is interesting to me that for DT greater than 0.65 the> three bins with high data points correspond to the three> peaks around 125-150 GeV and 175-200 GeV and 225-250 GeV> that were present back in the early semileptonic histograms> of the 1990s at CDF and D0.

Ok, but remember we are talking few events here. At 225-250 thereis ONE with a background of 0.3...

>> From the Decision Tree output graph of attached image D0TqDTs.jpg
> it seems to me that:> there are high data points for DT less than 0.3;> there is a very high data point for DT between 0.45 and 0.5;> and> there is a high data point for DT between 0.55 and 0.6.

Careful here, they are plotting only single-tag =2 jet eventshere. THey have 36 subsets of data, which pass through as manydifferent decision trees. One would need to examine all of them tomake any conclusion, and by eye you could anyway draw onlyqualitative ones...

>> I don't have a good intuitive understanding of the physical significance> of the various values of DT,> so> just looking at the various bins for Tquark mass in slide 47,> with most emphasis on DT greater than 0.65,> it seems to me to that maybe D0 might be seeing a significant number> of singleT events in low-mass regions such as 125-150 GeV,> although most of the events seem to be in the 175-200 GeV region.>> Is there a good paper for me to read that would explain more> about the physical significance of DT ?

I bet not... You need to go hat in hand to the D0 folks. Even ifthey do issue a paper on the analysis, the gory details will not beso clear as you would want them to be.

> Is there some physical reason that low DT sees events in 150-175 GeV,> while the higher DT sees a deficiency of events in 150-175 GeV ?

Not necessarily physical - statistical probably. If systematic,then maybe it is connected to their way of training trees with somany variables correlated to each other. Usually, decision trees mayget "overtrained" in such circumstances, and a way to avoid that isto do a random sampling of the variables used at each branching, andgrow a huge number of trees rather than a single one, then askingtrees to "vote" for a hypothesis. The random forests algorithm issuch a delicious thing, I have a post about it which links to ainformative site on that particular algorithm if you want moreinformation. The post is from the beginning of June I think, so youcould dig in my blog and find it. If you don't find it, and if youwant it, let me know...

> Are there more detailed analyses of the D0 event data that are> expressed in terms of Tquark mass ?


> Does the D0 analysis explain why the CDF data seemed to see> singleT events at low Tquark mass,> or> is it the opinion of the D0 people that the CDF low Tquark mass data> is an anomalous fluctuation that will go away with more CDF data ?

D0 data explains nothing... It is consistent with a SM single top,but not inconsistent with other satellite hypotheses IMO.

> Is it reasonable to expect that more data at both CDF and D0 will> answer these questions ?

I think more data always helps, provided you are willing to let agood hypothesis go if the data disprove it. But it is good to bestubborn for a while longer, especially since nobody did really asearch focused on low-mass single tops...



> Tony>>


Date: Thu, 14 Dec 2006 15:33:46 +0100 (CET)

From: Tommaso Dorigo <tommaso.dorigo@pd.infn.it>

(TD reply text in normal type)

To: Tony Smith <f75m17h@mindspring.com>(TS original message text in preformatted type)Subject: Re: thanks for speculations blog entry 

Hi Tony,

I will try to answer some questions below.


> ... However,> I do have another suggestion that might be unlikely to be accepted:>> It seems to me that if you already have a pretty good idea of what> you are looking for, then things like Decision Trees, Neural Networks,> etc., that are trainable would be useful in analyzing data with> a large number of events,> but> the problem is (as you said in your blog entry) "... training ... with> so many variables correlated to each other ...",> which may not be very bad if you really do know what you are looking for.> For example,> a highly trained thing might be very good for looking at the 175 GeV> Tquark peak (which is clearly known to exist) in great detail.>> However, if you are looking for something that you don't know,> for example the Higgs mass, then you should be careful (again, as> you said in your blog entry) to "... keep the most discriminant variable> aside, and train one's classifier with the others, being careful to avoid> variables that are correlated with the most powerful one ...".>

I think there is already an attempt in CDF to look for the unknownwithout any preconception besides the existence of what is alreadyproven to exist. These are "model-independent" searches. Not blessedyet, but they look for some thousands of permutations of interestingfinal states involving electrons, muons, missing Et, photons, jets,b-tags, and the like. All are fit with SM processes, and compared toexpectations.

If on the other hand one looks for individual events, one canindeed find something weird-looking from time to time, but withbillion-event size datasets this is a very dangerous thing to do froman experimental standpoint. We understand our data only from astatistical standpoint, while nobody on earth can say with certaintyif an electron candidate is a true electron of calorimeter noise or apizero traveling close to a pi-plus or what. I am afraid, in otherwords, that what you propose is a way to do science that we cannot doany longer in present day experiments. It is meaningful in neutrinoscattering experiments, on the other hand.

> Back then there were only a few dilepton events and really not-so-many> semileptonic events, and there were papers like the 1997 UC Berkeley thesis> of Erich Varnes at was back then available to the public on the web at> http://www-d0.fnal.gov/publications_talks/thesis/thesis.html> that described individual candidate events in great detail.>> Although my initial idea of 3 Tquark mass peaks came from> seeing the 1994 CDF semileptonic histogram of only 2 or 3 dozen events> as shown the original evidence paper,> I would have rejected my idea a long time ago> if 3 similar peaks had not been seen in a similar independent D0 histogram,> and> if the details of events as disclosed by Erich Varnes's thesis and> similar papers had not seemed to me to be consistent with my speculation.>> Anyhow,> now that luminosities are much higher and the raw number of candidate> events is very large,> it seems that there are two ways to go:>> 1 - look at all the large number of events statistically, which probably> requires some training and may be good for detailed study of expected> stuff, but may overlook unexpected stuff; > 2 - simulate the good-old-days of just a few candidate events by taking> really random small samples and then looking at those individually> and in great detail, thus possibly seeing something totally unexpected.> If something unexpected is indicated, then train a statistical thing> to look closely to see whether it is real or only a fluctuation.

As I said, this can be done but I would bet it is very hard toconvince an experimenter to part with his most powerful weapon, thatis Monte Carlo simulations and comparisons to it. If one does whatyou propose on Monte Carlo, one will likely find very unexpectedthings too... Even if the simulated process is qcd dijet production.I say this out of experience!

> I think that the large collaborations of today love the purely statistical> approach, because it is in fact very useful for known or expected stuff,> and is likely to produce consensus results without unpleasant dissent,> something that bureaucracies (both of the collaboration and the funding> agencies) like and are comfortable with.

I agree with you, but I think the alternative is not what youpropose, rather it is a signature-based search.

>> However, I wish that the small-random-sample-in-detail approach were> also employed to some degree, on the (maybe unlikely but who knows)> chance of actually seeing something really unexpected.>> Tony>> PS - Thanks again very much for your very patient and clear> answers to my speculative questions.

You're welcome...




Tony Smith's Home Page