(#) The Need for a More Complex Class of Functions # The Problem A few days ago I saw a tweet like this: “you shouldn’t have 6 return values from a function”. I think many programmers would agree with this, having too many parameters or return values for functions are usually upsetting practices, they are hard to read/write and cause a lot of boilerplate code for the caller, especially when you just need a single return value from the function. Multiple return values have two problems: 1. Inefficiency, because program needs to execute complete function even if the caller just needs a byproduct that is produced in one of the earlier stages. 2. As the caller you need to manage 6 different return values, codes like this looks plain ugly to me and it's hard to refactor: ~~~~~~ _, _, _, my_value, _ = calc_values(...) // or my_value = calc_values(...)[3] // both requires the knowledge of the index you want and might be pretty cryptic ~~~~~~ For example the first approach gives an error if an additional return value is added to the calc_values function, but it's still a pretty common approach in Python From another point of view, as engineers we also don't want to repeat the code we wrote. So for the function designers, we don't want to have 6 different functions that are each doing a very similar computation just to make the caller happy. This would cause: 1. Again inefficiency, because if the caller wants to access all these values separately, he/she would make 6 different calls to very similar functions or has to do bookkeeping for the called functions by holding precomputed values. 2. As the function writer you would need to manage 6 very similar functions, refactoring becomes a hassle whenever something needs to change. ## So overall: 1. We want functions to exit immediately after all the necessary statements are executed for the return statement. 2. We want to make use of the common substructures of the problems, so the caller can receive multiple return values at once without the added computational cost if the values are related. 3. We don't want to keep modifying several functions, just to satisfy "a function should do one thing" mantra, which is sometimes a good advice but most of the time it's not # Examples of Existing Approaches ## Long/Monolithic Function with Multiple Return Values Let's illustrate a function that returns several values at once which are related in their calculations, and all the named assignments are final (e.g. m_mesh isn't changed between its assignment and return statement): ~~~~~~ // a semi-realistic 3D graphics function func sample_light_source (sampler, scene): ... byproduct_a = ... ... m_mesh = ...(byproduct_a) ... m_point = ...(m_mesh) ... ... m_light_ray = ...(byproduct_a, ...) ... ... ... inv_sampling_prob = ...(m_light_ray, m_mesh, m_point) return m_mesh, m_point, m_light_ray, inv_sampling_prob ~~~~~~ Pros: This implementation is concise (somewhere between 20-25 lines), doesn't do a lot of parameter passing, it's usually easy to refactor. I just think it's neat. Cons: Now assume another function only wants to access the 'm_point' object for some reason and the request is happening pretty frequent, and some arbitrary user thinks this function has too many return values so you decide to chop it into smaller functions. ## Micro Functions, Several Smaller Functions ~~~~~~ func sample_light_source_mesh (sampler, scene): ... byproduct_a = ... ... m_mesh = ...(byproduct_a) return byproduct_a, m_mesh func sample_light_source_point (sampler, scene, m_mesh=None, byproduct_a=None): if m_mesh is None: byproduct_a, m_mesh = sample_mesh(sampler, scene) ... m_point = ...(m_mesh) return byproduct_a, m_point func sample_light_ray (sampler, scene, m_mesh=None, m_point=None, byproduct_a=None): if: m_point is None: byproduct_a, m_point = sample_light_source_point(sampler, scene, m_mesh, byproduct_a) ... ... m_light_ray = ...(byproduct_a, ...) return m_light_ray, m_mesh, m_point func calc_inv_sampling_prob(sampler, scene, m_mesh=None, byproduct_a=None, m_point=None, m_light_ray=None) if m_light_ray is None: m_light_ray, m_mesh, m_point = sample_light_ray (sampler, scene, m_mesh, m_point, byproduct_a): ... ... ... inv_sampling_prob = ...(m_light_ray, m_mesh, m_point) return inv_sampling_prob ~~~~~~ Pros: This design feels very elegant, there isn't any inefficiency caused by executing unnecessary lines. You can leverage some precomputed values. Every function is doing ~almost~ exactly one thing. Cons: 1. If you want to make your downstream functions more computationally efficient, the upstream functions still need to return a lot of values OR you need bunch of if else statements to keep track of which values are precomputed. 2. Caller needs to do bookkeeping: imagine the situation where the caller needs to access 'm_point' and the 'inv_sampling_prob': ~~~~~~ byproduct_a, m_point = sample_light_source_point(sampler, scene) inv_sampling_prob = calc_inv_sampling_prob(sampler, scene, byproduct_a=byproduct_a, m_point=m_point) ~~~~~~ We had to keep the byproduct_a value in our namespace even if we don't need it, just to do bookkeeping for the functions we are calling. Imagine there are multiple values like this. 3. This is a refactoring hell, adding another parameter, another return value etc. causes at least 10~ lines to change, add some other future byproducts too and this becomes a dreaded part of the codebase. You will be writing boilerplate code which is not satisfying to manage. 4. Overall the code is longer than the monolithic function approach and you need to keep thinking about the architecture of the functions rather than the actual functions. Writing a single longer function is less mentally draining. # Proposed Solution At this point most engineers would try to give some advices about the best programming practices that handle such cases, but I think this issue is actually very simple if we can let the programming languages handle it with a new class of functions. Like how anonymous/lambda functions got incorporated into object oriented languages; we need a new type of function for these cases. In compiled languages like C++ template functions are generated for each class that is using them. So even if you write just a single function, the compiler creates many copies of the same function under the hood. We also do the same thing with the default parameters, so that we don't need to write the same function twice, e.g. each parameter with a default value in the C++ creates two separate functions under the hood, so we don't have to write different functions for each case separately. Although these under the hood solutions exist for many situations in programming, there isn't any language that does it for the different combinations of return variables (as far as I know). And I feel this is exactly what we need for the problems like this. What I am imagining is a syntax like these for some functions: ## Modified Return Statement either we enhance the return statement, and compiler/interpereter checks when is it safe to break up the function into little pieces i.e. when a variable is not modified anymore ~~~~~~ complex_func sample_light_source (sampler, scene): ... byproduct_a = ... ... m_mesh = ...(byproduct_a) ... m_point = ...(m_mesh) ... ... m_light_ray = ...(byproduct_a, ...) ... ... ... inv_sampling_prob = ...(m_light_ray, m_mesh, m_point) return mesh=m_mesh, point=m_point, ray=m_light_ray, p=inv_sampling_prob // with this call, the function is partially executed and returns immediately when it calculates necessary return variables // example call to the function: my_mesh:=mesh, my_ray:=ray = sample_light_source(sampler, scene) ~~~~~~ ## New Keyword for Some Variables With a new keyword we can mark the variables we may return, ensuring they are safe to return after this definition ~~~~~~ complex_func sample_light_source (sampler, scene): ... byproduct_a = ... ... returnable m_mesh = ...(byproduct_a) ... returnable m_point = ...(m_mesh) ... ... returnable m_light_ray = ...(byproduct_a, ...) ... ... ... returnable inv_sampling_prob = ...(m_light_ray, m_mesh, m_point) // again, the function is partially executed my_mesh:=m_mesh, my_ray:=m_light_ray = sample_light_source(sampler, scene) ~~~~~~ The exact syntax doesn't matter much, but the compiler/language can create ~2^4 combinations of this example function in a second, without any effort required by the software engineers. Code is much cleaner because there isn't much parameter passing/directing going on. You can give self-explanatory names to your return variables (like it happens with named parameters). # Discussion I think that because programming languages don't provide this ability, many programmers spend a lot of time on dividing their larger functions into smaller parts. In some cases this is almost as horrible as writing template functions by hand for each separate class. For me it's very annoying when I have to refactor a function into two, just to access another variable produced at the halfway. In my opinion, writing a keyword next to the variable I want to expose would be a much better solution. Thanks for reading my rant about "what is wrong with all the programming languages", if you know a programming language that has this feature please let me know. I am not very familiar with many languages except a handful mainstream languages like Python, C++, Java. I am also interested in some design patterns about how to deal with these cases. I feel like it is impossible to be efficient, elegant, and having non repeated code at the same time with the current return semantics but there might be a good trade-off.