{"id":188,"date":"2022-01-25T02:59:15","date_gmt":"2022-01-25T02:59:15","guid":{"rendered":"http:\/\/aurimas.eu\/blog\/?p=188"},"modified":"2022-07-23T04:58:11","modified_gmt":"2022-07-23T04:58:11","slug":"interpretation-of-log-transformations-in-linear-models-just-how-accurate-is-it","status":"publish","type":"post","link":"https:\/\/aurimas.eu\/blog\/2022\/01\/interpretation-of-log-transformations-in-linear-models-just-how-accurate-is-it\/","title":{"rendered":"Interpretation of log transformations in linear models: just how accurate is it?"},"content":{"rendered":"\n<p>If linear regression is statistics\/econometrics 101, then log transformations of dependent and independent variables and the associated interpretations must be statistics\/econometrics 102. <\/p>\n\n\n\n<p>Typically, you are told that:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>If you log-transform an independent variable, then the regression coefficient \\( b \\) associated with that variable can be interpreted as &#8220;for every 1% increase in the independent variable, the dependent variable increases by \\( \\frac{b}{100} \\) units&#8221;.<\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>If you log-transform a dependent variable, then the regression coeffcient \\( b \\) tells you that for every unit increase in the independent variable, the dependent variable increases by \\( b\\% \\).<\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>If you log-transform both the dependent and independent variables, then the interpretation is that for every 1% increase in the independent variable, the dependent variable increases by<strong><em> <\/em><\/strong>\\( b\\% \\)<strong><em>.<\/em><\/strong> This is especially useful for economists studying price elasticities, for example.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"652\" src=\"https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/image-1-1024x652.png?resize=1024%2C652\" alt=\"\" class=\"wp-image-189\" srcset=\"https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/image-1.png?resize=1024%2C652&amp;ssl=1 1024w, https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/image-1.png?resize=300%2C191&amp;ssl=1 300w, https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/image-1.png?resize=768%2C489&amp;ssl=1 768w, https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/image-1.png?w=1426&amp;ssl=1 1426w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption>Lecture notes from the Econometrics class in my undergraduate studies<\/figcaption><\/figure>\n\n\n\n<p>If you are lucky, you may be told that you need to be careful with the interpretation of coefficients in case your dependent variable is log-transformed and the coefficients are not small. And maybe even given the precise formula to obtain the percentage effect: \\( 100 \\left ( e^{b} &#8211; 1 \\right ) \\). That&#8217;s exactly what happened in my class, and I am happy it nudged me to investigate this further. I think a lot of people may overestimate how accurate the percentage-based interpretations really are. <\/p>\n\n\n\n<p>To begin with, where do all these percentages come from? The typical explanation relies on the fact that the first-order derivative of the natural logarithm is \\( \\frac{1}{x} \\). But derivatives are all about &#8220;very small&#8221; changes in value and arguably 1 unit increase\/decrease is not <em>that<\/em> small.<\/p>\n\n\n\n<p>Here&#8217;s what the maths looks like for the simplest model \\(y = a + bx\\):<\/p>\n\n\n\n<p>Log-transform of an independent variable (linear-log model):<\/p>\n\n\n\n<p>$$ y_0 = a + b \\log  \\left( x  \\right)  \\text{ (our &#8220;base case&#8221;) } $$<\/p>\n\n\n\n<p>$$ y_1 = a + b \\log \\left( 1.01x \\right) \\text{ (x increases by 1%) } $$<\/p>\n\n\n\n<p>$$ y_1 &#8211; y_0 =  b \\log \\left( 1.01x \\right) &#8211; b \\log  \\left( x  \\right)  = b \\log \\left( \\frac{1.01x}{x} \\right) = b \\log \\left(1.01 \\right) $$<\/p>\n\n\n\n<p>In other words, when \\(x\\) increases by 1%, the change in \\(y\\) is not \\(b\\), but rather  \\( b \\log \\left(1.01 \\right) \\). What is  \\( \\log \\left(1.01 \\right) \\)? It&#8217;s approximately \\( 0.00995 \\) or \\(0.001\\) and thus the rule all of learned. But it is \\(5\\% \\) off! <\/p>\n\n\n\n<p>Consider the following scenario. You are currently selling goods at a price of $100 and your last year sales were $10m. You are interested in raising the prices by 15% and you want to know what is a reasonable sales budget for the next year. A data scientist runs some analysis and finds that the regression coefficient is -90,000 and thus reports that for every 1% increase in price, you can expect to sell 900 units less. For a 15% price increase, a naive answer would be that 13,500 (90,000 * 15 \/ 100) less units will be sold. Your sales budget could thus be 115 * (100,000 &#8211; 13,500) ~ 9.95m. You begrudgingly prepare yourself for explaining to the higher-ups why a price increase will not result in higher sales.<\/p>\n\n\n\n<p>A precise, answer, however, would be that you should expect \\( -90000 \\log \\left( 1.1 \\right)  = 12,579 \\) less units sold. That would result in a sales budget of $1.005m. A contrived example? Perhaps, but there surely are situations where even such a small difference is important enough.<\/p>\n\n\n\n<p>Things get even more interesting in the log-linear model:<\/p>\n\n\n\n<p> $$ \\log \\left( y_0 \\right) = a + b x  \\text{ (our &#8220;base case&#8221;) } $$ <\/p>\n\n\n\n<p>  $$ \\log \\left( y_1 \\right) = a + b \\left( x + 1 \\right)  \\text{ (let&#8217;s add 1 to x) } $$  <\/p>\n\n\n\n<p>$$ \\frac{y_1 &#8211; y_0}{y_0} =  \\frac { e^{a + b \\left( x + 1 \\right)}   &#8211; e^{ a + b x } } {  e^{ a + b x } }  =   \\frac { e^{a + b x} \\left( e^{b} &#8211; 1 \\right)  } {  e^{ a + b x } } =   e^{b} &#8211; 1 $$<\/p>\n\n\n\n<p>Assuming \\( b \\) is small,  \\( e^{b} &#8211; 1 \\approx b \\). But what if it is not? Here&#8217;s how it looks graphically. If, say, your coefficient is equal to 1, then what it really means is not that &#8220;1 unit increase in the independent variable will result in a 1% increase in the response&#8221;. It&#8217;s actually 1.7%. That&#8217;s quite a difference.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"566\" src=\"https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/plot_zoom_png-1024x566.png?resize=1024%2C566\" alt=\"\" class=\"wp-image-231\" srcset=\"https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/plot_zoom_png.png?resize=1024%2C566&amp;ssl=1 1024w, https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/plot_zoom_png.png?resize=300%2C166&amp;ssl=1 300w, https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/plot_zoom_png.png?resize=768%2C424&amp;ssl=1 768w, https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/plot_zoom_png.png?w=1213&amp;ssl=1 1213w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption>Log-linear model coefficient interpretation<\/figcaption><\/figure>\n\n\n\n<p>As you may guess, in a log-log model everything gets compounded, resulting in the following equation. If you find that your coefficient is, say, -2, then its precise effect on response is not 2% but rather 1.97%.<\/p>\n\n\n\n<p> $$ \\frac{y_1 &#8211; y_0}{y_0} = e^{b * log 1.01} &#8211; 1 $$ <\/p>\n\n\n\n<p>Does it matter all the time? Definitely not. But, at least based on how I was taught these topics, I don&#8217;t think a lot of people are aware of the approximation errors using the simple percentage-based interpretation. There may be cases where the approximation error is important enough to be aware of.<\/p>\n\n\n\n<div style=\"padding:20px\" class=\"wp-block-tnp-minimal\"><p>Subscribe for infrequent new posts:<\/p><div><div class=\"tnp tnp-subscription-minimal  \"><form action=\"https:\/\/aurimas.eu\/a\/wp-admin\/admin-ajax.php?action=tnp&amp;na=s\" method=\"post\" style=\"text-align: center\"><input type=\"hidden\" name=\"nr\" value=\"minimal\">\n<input type=\"hidden\" name=\"nlang\" value=\"\">\n<input class=\"tnp-email\" type=\"email\" required name=\"ne\" value=\"\" placeholder=\"Email\"><input class=\"tnp-submit\" type=\"submit\" value=\"Subscribe\" style=\"\">\n<div class=\"tnp-field tnp-privacy-field\"><label><input type=\"checkbox\" name=\"ny\" required class=\"tnp-privacy\"> I agree to to receive new post notifications via email<\/label><\/div><\/form><\/div>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Log-transformations and their interpretation as percentage impact is taught in every introductory regression class. But are most people aware that there is a hidden approximation behind the percentage-based intuition? One that may not be appropriate in some cases?<\/p>\n","protected":false},"author":1,"featured_media":231,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[67],"tags":[77,76,75],"class_list":["post-188","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics","tag-log-transformations","tag-regressions","tag-statistics"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/aurimas.eu\/a\/wp-content\/uploads\/plot_zoom_png.png?fit=1213%2C670&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paWzzQ-32","_links":{"self":[{"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/posts\/188","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/comments?post=188"}],"version-history":[{"count":45,"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/posts\/188\/revisions"}],"predecessor-version":[{"id":384,"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/posts\/188\/revisions\/384"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/media\/231"}],"wp:attachment":[{"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/media?parent=188"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/categories?post=188"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aurimas.eu\/blog\/wp-json\/wp\/v2\/tags?post=188"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}