Shared publicly  - 
 
Hey ColdFusion programmers.


I'd like to hear how you're using CFThread in your ColdFusion applications.

In addition, what problems have you had while using it, and how (if at all) have you solved them?

What do you dislike about CFThread?

Have you run into significant weaknesses with CFThread or other problems where it simply could not do what you wanted to do?

Have you ever faced problems that required a concurrency solution, and your best option was to use another language instead?

Finally, if there's anything you'd like to add related to concurrency in CF, not specifically related to CFThread, please do tell. This might include other libraries you've used in CF apps (Akka, GPars), or even other languages (scala, clojure).

Thanks!
3
2
Adam Crump's profile photoMarc Esher's profile photoBradley Moore's profile photoJamie Krug's profile photo
26 comments
 
Michael, was this your typical thread.start(); thread.join() approach? Thanks for commenting!
 
Long running report generation. Throw it in a cfthread, let it go, no join().
 
I use threads for anything that takes over 50ms and the user doesn't care about. Sending emails is probably the most common use. I don't join those back.

I started running into memory and/or garbage collection problems for some tasks with a lot of file manipulation. In my process loop, I spin off a thread, have it do work, then join it back before continuing. This lets the memory level stay flat and the garbage collector fed.

If a job needs to finish faster, then you can create half a dozen main threads to process an equal portion of the process loop.

In my experience, a high number of threads will crash your instance, so I try to keep the active thread count low.
 
Good one. Is this a case where if something goes wrong, you don't care? Follow-up: if you had 100 long running reports, would you throw them all into CFThreads and let CF "manage the pool", as it were?
 
I'd probably throttle them and only spin up a couple of active threads at a time. They still take up resources, so I'd want some resource to spare for user requests.
 
The worst bit about threads is debugging them. I suggest making a custom tag to try/catch any code you put in the thread. Log the error or send yourself a dump email, whatever you like.
 
I save status throughout the process to the database, including error details (assuming the DB being down isn't the problem). First tag after cfthread is cftry.
 
Agree with Bradley. I have't run into major issues with then, but debugging can be very frustrating sometimes, even for a simple error.
 
Bradley, when you say "keep the active thread count low", do you mean that you keep that setting low in CF Admin? Or that you have your own "manager" of sorts, wherein you track the number of completed threads and don't spawn new ones if that counter gets above a certain limit?
 
Mike, don't worry, I'm very familiar with the java concurrency framework. That's in fact kinda why I posted this question ;-). I really want to see what people are doing with cfthread in the real world... there's a class of app where cfthread is good enough, and a class where it isn't but we just suck it up and live with it, and I'd like to get as many examples as I can about those two categories.

In fact, there's probably a 3rd category: apps / tasks where we do the simplest thing possible b/c of the limits of CFThread; and if we knew about more powerful concurrency abstractions, perhaps it'd open up new doors for us.
 
I was about to update an application so the event handler spins a new thread for each read operation tasked to the service layer. Finally, these 5-15 reads would be joined within the handler before dispatching the view. Pretty basic, start-join workflow.

Generally, I've elected to use Event Gateways where I can simply because I found the implementation in ACF to be more mature (built in logging, easier to observe) and they can queue thousands of messages.

Recently, we elected to begin moving this main app I work on to Amazon so I decided to wait until the move is complete before continuing my threaded handler project. If their caching solutions are fast enough I may not add the complexity.

Your session at cfO would be my main interest in attending--sadly, looks like I'll be slammed and unable to attend this year. Thanks for bringing this conversation to the masses!

Hope that helps?
 
Marc, thus far, I've only used cfthread for throw-away logging to my db. I have a performance-critical single-threaded web service, so I moved the semi-unimportant database logging into other threads, kind of like a throw-away process, I don't care if or when it completes.
 
The only serious use of CFTHREAD I've done is for invoking calls to a service layer method to publish information to SQL Server Message Queue. This is a fire-and-forget thing, so there is no joining for this.
 
Marc,

We have unattended "jobs" that run in the background every few minutes on behalf of users. Essentially, these jobs execute a number of URLs and stuff the HTTP results in a zip file and emails the zip to the user. There may be many jobs waiting in the queue. The main process pops N jobs off the stack and CFTHREAD is used to execute these in parallel. Other logic controls the maximum number of concurrent jobs (10, I think). Logging is key.

-bill
 
I have a few batch jobs that need to process 3-10k worth of records in a database. For each record I have to call a separate server that has some high end screen scraping applications installed, the system then screen scrapes another server (typically an internal mainframe) to simulate what a human normally would do. The results are then updated in the database. Since each of the 3k+ records takes 10-90 seconds to process doing 3k+ processes in sequence can take days.

I use a scheduled task that runs a .cfm page every say 5 minutes. The page sets up 3 threads (or however many our mainframe people will allow me to concurrently use to connect to their systems without bringing them to their knees) and tells each thread to process 10 or so orders at a time. The threads are CFC calls.

The downside to this is that in order to keep the threads from attempting to process the same db records more than once and minimizing the database interactions (i.e I don't lock each record first) I choose the records to process by selecting the record from the db where the primary key (an identity/incrementing value) is X mod Y where the X & Y are the number of threads and a remainder value.

It seems to work but it isn't particularly easy to change the thread count on the fly and when I want to use more threads I generally have to change the code, passing in thread counts, the remainder values, and sometimes modifying other code.

The good side is that I can process for more records in 6-24 hours than I could using 1 thread/concurrent connection to the screen scraping/mainframe. Downside is that some threads finish far faster than others leaving idle capacity and the hassles with tweaking code endlessly. It isn't an easy to manage process when processing time needs to be optimized.
 
Posted before, but a couple of extra comments:

Hate the fact that my number of threads is limited by licence in ACF. (Yay for dynamic proxies to get around that chestnut).

Other libraries - Gpars is the sh*^. Enough said there.

Starting to slowly get into clojure as I'm looking more at multi-thread and/or distributed machine processing.

Oh, forgot to mention - my Sesame library, the concurrency collection methods there are pretty much ripped from GPars, and use a dynamic proxy to implement them
https://github.com/markmandel/Sesame
 
+Bill Shelton , this "job" you speak of... are these CF Scheduled tasks? Or threads that just spin forever (while(true)....)? Or something else?
 
+Ryan Hartwich Thank you for the detailed response! Your point about the database is a particularly good one, as in my experience, no matter what server-side tech I'm using, that problem must be addressed. MongoDB has really nice mechanisms for this, but most of us are using RDBMS (for good reason) and consequently have to either lock or come up with other schemes.

Your description reminds me of something that's come out in JDK7 -- ForkJoinPool. Basically, it's the concept of tasks that have subtasks, but on top of that it has a work-stealing algorithm so that when threads go idle, they can go steal work from other members of the pool. Conceptually, related to your situation, it'd be something like this:

1) a single thread queries the work to be done and pre-assigns each DB record to a "task".
2) submit all those tasks to the ForkJoinPool
3) if one member of the pool finishes early, that member can steal work from other members.

Fun stuff! Thanks again for the detail.
 
+Mark Mandel Sesame looks fantastic... great work! Agreed... GPars is a pleasure to work with
 
Marc: The scheduler kicks off a task every few minutes. This task will then spin up N threads depending upon the number of jobs in the queue (1 thread per job) and the maximum allowable number of jobs/threads. Each thread then goes out and does it's thing. The scheduled task itself is asynchronous and returns as soon as all the threads start, usually in ~1s.
 
We have used cfthread to provide a near real time progress bar to a long running process. We need to upload several files, convert them to PDF, apply DRM, and merge into a portfolio all via Livecycle. What we did was spin off a thread to do the work. As the tread processes it updates a session variable with the progress as it goes along. For the actual progress bar, the client interface simply polls the server for the current status of the thread and updates the progress bar
 
Thanks, Adam! Great example
 
+Adam Crump when you say you check the status of the thread, do you mean you're doing something like if( structKeyExists( cfthread, "yourThreadName" ) )... kind of check?
 
+Marc Esher Nah, we need to know where it is at in a 10 step process. So when the request is made we generate a uuid and assign that in session. Spin the thread up and pass it the uuid and return the uuid to the client. Then as the thread does its work it updates the session with where its at and any messages. When the client polls via ajax it passes in the uuid and gets the current status and any messages which is then updated in an ext progress bar
Add a comment...