1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
|
=head1 NAME
libeio - truly asynchronous POSIX I/O
=head1 SYNOPSIS
#include <eio.h>
=head1 DESCRIPTION
The newest version of this document is also available as an html-formatted
web page you might find easier to navigate when reading it for the first
time: L<http://pod.tst.eu/http://cvs.schmorp.de/libeio/eio.pod>.
Note that this library is a by-product of the C<IO::AIO> perl
module, and many of the subtler points regarding requests lifetime
and so on are only documented in its documentation at the
moment: L<http://pod.tst.eu/http://cvs.schmorp.de/IO-AIO/AIO.pm>.
=head2 FEATURES
This library provides fully asynchronous versions of most POSIX functions
dealing with I/O. Unlike most asynchronous libraries, this not only
includes C<read> and C<write>, but also C<open>, C<stat>, C<unlink> and
similar functions, as well as less rarely ones such as C<mknod>, C<futime>
or C<readlink>.
It also offers wrappers around C<sendfile> (Solaris, Linux, HP-UX and
FreeBSD, with emulation on other platforms) and C<readahead> (Linux, with
emulation elsewhere>).
The goal is to enable you to write fully non-blocking programs. For
example, in a game server, you would not want to freeze for a few seconds
just because the server is running a backup and you happen to call
C<readdir>.
=head2 TIME REPRESENTATION
Libeio represents time as a single floating point number, representing the
(fractional) number of seconds since the (POSIX) epoch (somewhere near
the beginning of 1970, details are complicated, don't ask). This type is
called C<eio_tstamp>, but it is guaranteed to be of type C<double> (or
better), so you can freely use C<double> yourself.
Unlike the name component C<stamp> might indicate, it is also used for
time differences throughout libeio.
=head2 FORK SUPPORT
Usage of pthreads in a program changes the semantics of fork
considerably. Specifically, only async-safe functions can be called after
fork. Libeio uses pthreads, so this applies, and makes using fork hard for
anything but relatively fork + exec uses.
This library only works in the process that initialised it: Forking is
fully supported, but using libeio in any other process than the one that
called C<eio_init> is not.
You might get around by not I<using> libeio before (or after) forking in
the parent, and using it in the child afterwards. You could also try to
call the L<eio_init> function again in the child, which will brutally
reinitialise all data structures, which isn't POSIX conformant, but
typically works.
Otherwise, the only recommendation you should follow is: treat fork code
the same way you treat signal handlers, and only ever call C<eio_init> in
the process that uses it, and only once ever.
=head1 INITIALISATION/INTEGRATION
Before you can call any eio functions you first have to initialise the
library. The library integrates into any event loop, but can also be used
without one, including in polling mode.
You have to provide the necessary glue yourself, however.
=over 4
=item int eio_init (void (*want_poll)(void), void (*done_poll)(void))
This function initialises the library. On success it returns C<0>, on
failure it returns C<-1> and sets C<errno> appropriately.
It accepts two function pointers specifying callbacks as argument, both of
which can be C<0>, in which case the callback isn't called.
There is currently no way to change these callbacks later, or to
"uninitialise" the library again.
=item want_poll callback
The C<want_poll> callback is invoked whenever libeio wants attention (i.e.
it wants to be polled by calling C<eio_poll>). It is "edge-triggered",
that is, it will only be called once when eio wants attention, until all
pending requests have been handled.
This callback is called while locks are being held, so I<you must
not call any libeio functions inside this callback>. That includes
C<eio_poll>. What you should do is notify some other thread, or wake up
your event loop, and then call C<eio_poll>.
=item done_poll callback
This callback is invoked when libeio detects that all pending requests
have been handled. It is "edge-triggered", that is, it will only be
called once after C<want_poll>. To put it differently, C<want_poll> and
C<done_poll> are invoked in pairs: after C<want_poll> you have to call
C<eio_poll ()> until either C<eio_poll> indicates that everything has been
handled or C<done_poll> has been called, which signals the same.
Note that C<eio_poll> might return after C<done_poll> and C<want_poll>
have been called again, so watch out for races in your code.
As with C<want_poll>, this callback is called while locks are being held,
so you I<must not call any libeio functions form within this callback>.
=item int eio_poll ()
This function has to be called whenever there are pending requests that
need finishing. You usually call this after C<want_poll> has indicated
that you should do so, but you can also call this function regularly to
poll for new results.
If any request invocation returns a non-zero value, then C<eio_poll ()>
immediately returns with that value as return value.
Otherwise, if all requests could be handled, it returns C<0>. If for some
reason not all requests have been handled, i.e. some are still pending, it
returns C<-1>.
=back
For libev, you would typically use an C<ev_async> watcher: the
C<want_poll> callback would invoke C<ev_async_send> to wake up the event
loop. Inside the callback set for the watcher, one would call C<eio_poll
()>.
If C<eio_poll ()> is configured to not handle all results in one go
(i.e. it returns C<-1>) then you should start an idle watcher that calls
C<eio_poll> until it returns something C<!= -1>.
A full-featured connector between libeio and libev would look as follows
(if C<eio_poll> is handling all requests, it can of course be simplified a
lot by removing the idle watcher logic):
static struct ev_loop *loop;
static ev_idle repeat_watcher;
static ev_async ready_watcher;
/* idle watcher callback, only used when eio_poll */
/* didn't handle all results in one call */
static void
repeat (EV_P_ ev_idle *w, int revents)
{
if (eio_poll () != -1)
ev_idle_stop (EV_A_ w);
}
/* eio has some results, process them */
static void
ready (EV_P_ ev_async *w, int revents)
{
if (eio_poll () == -1)
ev_idle_start (EV_A_ &repeat_watcher);
}
/* wake up the event loop */
static void
want_poll (void)
{
ev_async_send (loop, &ready_watcher)
}
void
my_init_eio ()
{
loop = EV_DEFAULT;
ev_idle_init (&repeat_watcher, repeat);
ev_async_init (&ready_watcher, ready);
ev_async_start (loop &watcher);
eio_init (want_poll, 0);
}
For most other event loops, you would typically use a pipe - the event
loop should be told to wait for read readiness on the read end. In
C<want_poll> you would write a single byte, in C<done_poll> you would try
to read that byte, and in the callback for the read end, you would call
C<eio_poll>.
You don't have to take special care in the case C<eio_poll> doesn't handle
all requests, as the done callback will not be invoked, so the event loop
will still signal readiness for the pipe until I<all> results have been
processed.
=head1 HIGH LEVEL REQUEST API
Libeio has both a high-level API, which consists of calling a request
function with a callback to be called on completion, and a low-level API
where you fill out request structures and submit them.
This section describes the high-level API.
=head2 REQUEST SUBMISSION AND RESULT PROCESSING
You submit a request by calling the relevant C<eio_TYPE> function with the
required parameters, a callback of type C<int (*eio_cb)(eio_req *req)>
(called C<eio_cb> below) and a freely usable C<void *data> argument.
The return value will either be 0, in case something went really wrong
(which can basically only happen on very fatal errors, such as C<malloc>
returning 0, which is rather unlikely), or a pointer to the newly-created
and submitted C<eio_req *>.
The callback will be called with an C<eio_req *> which contains the
results of the request. The members you can access inside that structure
vary from request to request, except for:
=over 4
=item C<ssize_t result>
This contains the result value from the call (usually the same as the
syscall of the same name).
=item C<int errorno>
This contains the value of C<errno> after the call.
=item C<void *data>
The C<void *data> member simply stores the value of the C<data> argument.
=back
The return value of the callback is normally C<0>, which tells libeio to
continue normally. If a callback returns a nonzero value, libeio will
stop processing results (in C<eio_poll>) and will return the value to its
caller.
Memory areas passed to libeio must stay valid as long as a request
executes, with the exception of paths, which are being copied
internally. Any memory libeio itself allocates will be freed after the
finish callback has been called. If you want to manage all memory passed
to libeio yourself you can use the low-level API.
For example, to open a file, you could do this:
static int
file_open_done (eio_req *req)
{
if (req->result < 0)
{
/* open() returned -1 */
errno = req->errorno;
perror ("open");
}
else
{
int fd = req->result;
/* now we have the new fd in fd */
}
return 0;
}
/* the first three arguments are passed to open(2) */
/* the remaining are priority, callback and data */
if (!eio_open ("/etc/passwd", O_RDONLY, 0, 0, file_open_done, 0))
abort (); /* something went wrong, we will all die!!! */
Note that you additionally need to call C<eio_poll> when the C<want_cb>
indicates that requests are ready to be processed.
=head2 CANCELLING REQUESTS
Sometimes the need for a request goes away before the request is
finished. In that case, one can cancel the request by a call to
C<eio_cancel>:
=over 4
=item eio_cancel (eio_req *req)
Cancel the request (and all its subrequests). If the request is currently
executing it might still continue to execute, and in other cases it might
still take a while till the request is cancelled.
Even if cancelled, the finish callback will still be invoked - the
callbacks of all cancellable requests need to check whether the request
has been cancelled by calling C<EIO_CANCELLED (req)>:
static int
my_eio_cb (eio_req *req)
{
if (EIO_CANCELLED (req))
return 0;
}
In addition, cancelled requests will I<either> have C<< req->result >>
set to C<-1> and C<errno> to C<ECANCELED>, or I<otherwise> they were
successfully executed, despite being cancelled (e.g. when they have
already been executed at the time they were cancelled).
C<EIO_CANCELLED> is still true for requests that have successfully
executed, as long as C<eio_cancel> was called on them at some point.
=back
=head2 AVAILABLE REQUESTS
The following request functions are available. I<All> of them return the
C<eio_req *> on success and C<0> on failure, and I<all> of them have the
same three trailing arguments: C<pri>, C<cb> and C<data>. The C<cb> is
mandatory, but in most cases, you pass in C<0> as C<pri> and C<0> or some
custom data value as C<data>.
=head3 POSIX API WRAPPERS
These requests simply wrap the POSIX call of the same name, with the same
arguments. If a function is not implemented by the OS and cannot be emulated
in some way, then all of these return C<-1> and set C<errorno> to C<ENOSYS>.
=over 4
=item eio_open (const char *path, int flags, mode_t mode, int pri, eio_cb cb, void *data)
=item eio_truncate (const char *path, off_t offset, int pri, eio_cb cb, void *data)
=item eio_chown (const char *path, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
=item eio_chmod (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
=item eio_mkdir (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
=item eio_rmdir (const char *path, int pri, eio_cb cb, void *data)
=item eio_unlink (const char *path, int pri, eio_cb cb, void *data)
=item eio_utime (const char *path, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
=item eio_mknod (const char *path, mode_t mode, dev_t dev, int pri, eio_cb cb, void *data)
=item eio_link (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
=item eio_symlink (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
=item eio_rename (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
=item eio_mlock (void *addr, size_t length, int pri, eio_cb cb, void *data)
=item eio_close (int fd, int pri, eio_cb cb, void *data)
=item eio_sync (int pri, eio_cb cb, void *data)
=item eio_fsync (int fd, int pri, eio_cb cb, void *data)
=item eio_fdatasync (int fd, int pri, eio_cb cb, void *data)
=item eio_futime (int fd, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
=item eio_ftruncate (int fd, off_t offset, int pri, eio_cb cb, void *data)
=item eio_fchmod (int fd, mode_t mode, int pri, eio_cb cb, void *data)
=item eio_fchown (int fd, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
=item eio_dup2 (int fd, int fd2, int pri, eio_cb cb, void *data)
These have the same semantics as the syscall of the same name, their
return value is available as C<< req->result >> later.
=item eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
=item eio_write (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
These two requests are called C<read> and C<write>, but actually wrap
C<pread> and C<pwrite>. On systems that lack these calls (such as cygwin),
libeio uses lseek/read_or_write/lseek and a mutex to serialise the
requests, so all these requests run serially and do not disturb each
other. However, they still disturb the file offset while they run, so it's
not safe to call these functions concurrently with non-libeio functions on
the same fd on these systems.
Not surprisingly, pread and pwrite are not thread-safe on Darwin (OS/X),
so it is advised not to submit multiple requests on the same fd on this
horrible pile of garbage.
=item eio_mlockall (int flags, int pri, eio_cb cb, void *data)
Like C<mlockall>, but the flag value constants are called
C<EIO_MCL_CURRENT> and C<EIO_MCL_FUTURE>.
=item eio_msync (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
Just like msync, except that the flag values are called C<EIO_MS_ASYNC>,
C<EIO_MS_INVALIDATE> and C<EIO_MS_SYNC>.
=item eio_readlink (const char *path, int pri, eio_cb cb, void *data)
If successful, the path read by C<readlink(2)> can be accessed via C<<
req->ptr2 >> and is I<NOT> null-terminated, with the length specified as
C<< req->result >>.
if (req->result >= 0)
{
char *target = strndup ((char *)req->ptr2, req->result);
free (target);
}
=item eio_realpath (const char *path, int pri, eio_cb cb, void *data)
Similar to the realpath libc function, but unlike that one, C<<
req->result >> is C<-1> on failure. On success, the result is the length
of the returned path in C<ptr2> (which is I<NOT> 0-terminated) - this is
similar to readlink.
=item eio_stat (const char *path, int pri, eio_cb cb, void *data)
=item eio_lstat (const char *path, int pri, eio_cb cb, void *data)
=item eio_fstat (int fd, int pri, eio_cb cb, void *data)
Stats a file - if C<< req->result >> indicates success, then you can
access the C<struct stat>-like structure via C<< req->ptr2 >>:
EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2;
=item eio_statvfs (const char *path, int pri, eio_cb cb, void *data)
=item eio_fstatvfs (int fd, int pri, eio_cb cb, void *data)
Stats a filesystem - if C<< req->result >> indicates success, then you can
access the C<struct statvfs>-like structure via C<< req->ptr2 >>:
EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2;
=back
=head3 READING DIRECTORIES
Reading directories sounds simple, but can be rather demanding, especially
if you want to do stuff such as traversing a directory hierarchy or
processing all files in a directory. Libeio can assist these complex tasks
with it's C<eio_readdir> call.
=over 4
=item eio_readdir (const char *path, int flags, int pri, eio_cb cb, void *data)
This is a very complex call. It basically reads through a whole directory
(via the C<opendir>, C<readdir> and C<closedir> calls) and returns either
the names or an array of C<struct eio_dirent>, depending on the C<flags>
argument.
The C<< req->result >> indicates either the number of files found, or
C<-1> on error. On success, null-terminated names can be found as C<< req->ptr2 >>,
and C<struct eio_dirents>, if requested by C<flags>, can be found via C<<
req->ptr1 >>.
Here is an example that prints all the names:
int i;
char *names = (char *)req->ptr2;
for (i = 0; i < req->result; ++i)
{
printf ("name #%d: %s\n", i, names);
/* move to next name */
names += strlen (names) + 1;
}
Pseudo-entries such as F<.> and F<..> are never returned by C<eio_readdir>.
C<flags> can be any combination of:
=over 4
=item EIO_READDIR_DENTS
If this flag is specified, then, in addition to the names in C<ptr2>,
also an array of C<struct eio_dirent> is returned, in C<ptr1>. A C<struct
eio_dirent> looks like this:
struct eio_dirent
{
int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */
unsigned short namelen; /* size of filename without trailing 0 */
unsigned char type; /* one of EIO_DT_* */
signed char score; /* internal use */
ino_t inode; /* the inode number, if available, otherwise unspecified */
};
The only members you normally would access are C<nameofs>, which is the
byte-offset from C<ptr2> to the start of the name, C<namelen> and C<type>.
C<type> can be one of:
C<EIO_DT_UNKNOWN> - if the type is not known (very common) and you have to C<stat>
the name yourself if you need to know,
one of the "standard" POSIX file types (C<EIO_DT_REG>, C<EIO_DT_DIR>, C<EIO_DT_LNK>,
C<EIO_DT_FIFO>, C<EIO_DT_SOCK>, C<EIO_DT_CHR>, C<EIO_DT_BLK>)
or some OS-specific type (currently
C<EIO_DT_MPC> - multiplexed char device (v7+coherent),
C<EIO_DT_NAM> - xenix special named file,
C<EIO_DT_MPB> - multiplexed block device (v7+coherent),
C<EIO_DT_NWK> - HP-UX network special,
C<EIO_DT_CMP> - VxFS compressed,
C<EIO_DT_DOOR> - solaris door, or
C<EIO_DT_WHT>).
This example prints all names and their type:
int i;
struct eio_dirent *ents = (struct eio_dirent *)req->ptr1;
char *names = (char *)req->ptr2;
for (i = 0; i < req->result; ++i)
{
struct eio_dirent *ent = ents + i;
char *name = names + ent->nameofs;
printf ("name #%d: %s (type %d)\n", i, name, ent->type);
}
=item EIO_READDIR_DIRS_FIRST
When this flag is specified, then the names will be returned in an order
where likely directories come first, in optimal C<stat> order. This is
useful when you need to quickly find directories, or you want to find all
directories while avoiding to stat() each entry.
If the system returns type information in readdir, then this is used
to find directories directly. Otherwise, likely directories are names
beginning with ".", or otherwise names with no dots, of which names with
short names are tried first.
=item EIO_READDIR_STAT_ORDER
When this flag is specified, then the names will be returned in an order
suitable for stat()'ing each one. That is, when you plan to stat()
all files in the given directory, then the returned order will likely
be fastest.
If both this flag and C<EIO_READDIR_DIRS_FIRST> are specified, then the
likely directories come first, resulting in a less optimal stat order.
=item EIO_READDIR_FOUND_UNKNOWN
This flag should not be specified when calling C<eio_readdir>. Instead,
it is being set by C<eio_readdir> (you can access the C<flags> via C<<
req->int1 >>, when any of the C<type>'s found were C<EIO_DT_UNKNOWN>. The
absence of this flag therefore indicates that all C<type>'s are known,
which can be used to speed up some algorithms.
A typical use case would be to identify all subdirectories within a
directory - you would ask C<eio_readdir> for C<EIO_READDIR_DIRS_FIRST>. If
then this flag is I<NOT> set, then all the entries at the beginning of the
returned array of type C<EIO_DT_DIR> are the directories. Otherwise, you
should start C<stat()>'ing the entries starting at the beginning of the
array, stopping as soon as you found all directories (the count can be
deduced by the link count of the directory).
=back
=back
=head3 OS-SPECIFIC CALL WRAPPERS
These wrap OS-specific calls (usually Linux ones), and might or might not
be emulated on other operating systems. Calls that are not emulated will
return C<-1> and set C<errno> to C<ENOSYS>.
=over 4
=item eio_sendfile (int out_fd, int in_fd, off_t in_offset, size_t length, int pri, eio_cb cb, void *data)
Wraps the C<sendfile> syscall. The arguments follow the Linux version, but
libeio supports and will use similar calls on FreeBSD, HP/UX, Solaris and
Darwin.
If the OS doesn't support some sendfile-like call, or the call fails,
indicating support for the given file descriptor type (for example,
Linux's sendfile might not support file to file copies), then libeio will
emulate the call in userspace, so there are almost no limitations on its
use.
=item eio_readahead (int fd, off_t offset, size_t length, int pri, eio_cb cb, void *data)
Calls C<readahead(2)>. If the syscall is missing, then the call is
emulated by simply reading the data (currently in 64kiB chunks).
=item eio_sync_file_range (int fd, off_t offset, size_t nbytes, unsigned int flags, int pri, eio_cb cb, void *data)
Calls C<sync_file_range>. If the syscall is missing, then this is the same
as calling C<fdatasync>.
Flags can be any combination of C<EIO_SYNC_FILE_RANGE_WAIT_BEFORE>,
C<EIO_SYNC_FILE_RANGE_WRITE> and C<EIO_SYNC_FILE_RANGE_WAIT_AFTER>.
=item eio_fallocate (int fd, int mode, off_t offset, off_t len, int pri, eio_cb cb, void *data)
Calls C<fallocate> (note: I<NOT> C<posix_fallocate>!). If the syscall is
missing, then it returns failure and sets C<errno> to C<ENOSYS>.
The C<mode> argument can be C<0> (for behaviour similar to
C<posix_fallocate>), or C<EIO_FALLOC_FL_KEEP_SIZE>, which keeps the size
of the file unchanged (but still preallocates space beyond end of file).
=back
=head3 LIBEIO-SPECIFIC REQUESTS
These requests are specific to libeio and do not correspond to any OS call.
=over 4
=item eio_mtouch (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
Reads (C<flags == 0>) or modifies (C<flags == EIO_MT_MODIFY) the given
memory area, page-wise, that is, it reads (or reads and writes back) the
first octet of every page that spans the memory area.
This can be used to page in some mmapped file, or dirty some pages. Note
that dirtying is an unlocked read-write access, so races can ensue when
the some other thread modifies the data stored in that memory area.
=item eio_custom (void (*)(eio_req *) execute, int pri, eio_cb cb, void *data)
Executes a custom request, i.e., a user-specified callback.
The callback gets the C<eio_req *> as parameter and is expected to read
and modify any request-specific members. Specifically, it should set C<<
req->result >> to the result value, just like other requests.
Here is an example that simply calls C<open>, like C<eio_open>, but it
uses the C<data> member as filename and uses a hardcoded C<O_RDONLY>. If
you want to pass more/other parameters, you either need to pass some
struct or so via C<data> or provide your own wrapper using the low-level
API.
static int
my_open_done (eio_req *req)
{
int fd = req->result;
return 0;
}
static void
my_open (eio_req *req)
{
req->result = open (req->data, O_RDONLY);
}
eio_custom (my_open, 0, my_open_done, "/etc/passwd");
=item eio_busy (eio_tstamp delay, int pri, eio_cb cb, void *data)
This is a request that takes C<delay> seconds to execute, but otherwise
does nothing - it simply puts one of the worker threads to sleep for this
long.
This request can be used to artificially increase load, e.g. for debugging
or benchmarking reasons.
=item eio_nop (int pri, eio_cb cb, void *data)
This request does nothing, except go through the whole request cycle. This
can be used to measure latency or in some cases to simplify code, but is
not really of much use.
=back
=head3 GROUPING AND LIMITING REQUESTS
There is one more rather special request, C<eio_grp>. It is a very special
aio request: Instead of doing something, it is a container for other eio
requests.
There are two primary use cases for this: a) bundle many requests into a
single, composite, request with a definite callback and the ability to
cancel the whole request with its subrequests and b) limiting the number
of "active" requests.
Further below you will find more discussion of these topics - first
follows the reference section detailing the request generator and other
methods.
=over 4
=item eio_req *grp = eio_grp (eio_cb cb, void *data)
Creates, submits and returns a group request. Note that it doesn't have a
priority, unlike all other requests.
=item eio_grp_add (eio_req *grp, eio_req *req)
Adds a request to the request group.
=item eio_grp_cancel (eio_req *grp)
Cancels all requests I<in> the group, but I<not> the group request
itself. You can cancel the group request I<and> all subrequests via a
normal C<eio_cancel> call.
=back
=head4 GROUP REQUEST LIFETIME
Left alone, a group request will instantly move to the pending state and
will be finished at the next call of C<eio_poll>.
The usefulness stems from the fact that, if a subrequest is added to a
group I<before> a call to C<eio_poll>, via C<eio_grp_add>, then the group
will not finish until all the subrequests have finished.
So the usage cycle of a group request is like this: after it is created,
you normally instantly add a subrequest. If none is added, the group
request will finish on it's own. As long as subrequests are added before
the group request is finished it will be kept from finishing, that is the
callbacks of any subrequests can, in turn, add more requests to the group,
and as long as any requests are active, the group request itself will not
finish.
=head4 CREATING COMPOSITE REQUESTS
Imagine you wanted to create an C<eio_load> request that opens a file,
reads it and closes it. This means it has to execute at least three eio
requests, but for various reasons it might be nice if that request looked
like any other eio request.
This can be done with groups:
=over 4
=item 1) create the request object
Create a group that contains all further requests. This is the request you
can return as "the load request".
=item 2) open the file, maybe
Next, open the file with C<eio_open> and add the request to the group
request and you are finished setting up the request.
If, for some reason, you cannot C<eio_open> (path is a null ptr?) you
can set C<< grp->result >> to C<-1> to signal an error and let the group
request finish on its own.
=item 3) open callback adds more requests
In the open callback, if the open was not successful, copy C<<
req->errorno >> to C<< grp->errorno >> and set C<< grp->errorno >> to
C<-1> to signal an error.
Otherwise, malloc some memory or so and issue a read request, adding the
read request to the group.
=item 4) continue issuing requests till finished
In the real callback, check for errors and possibly continue with
C<eio_close> or any other eio request in the same way.
As soon as no new requests are added the group request will finish. Make
sure you I<always> set C<< grp->result >> to some sensible value.
=back
=head4 REQUEST LIMITING
#TODO
void eio_grp_limit (eio_req *grp, int limit);
=back
=head1 LOW LEVEL REQUEST API
#TODO
=head1 ANATOMY AND LIFETIME OF AN EIO REQUEST
A request is represented by a structure of type C<eio_req>. To initialise
it, clear it to all zero bytes:
eio_req req;
memset (&req, 0, sizeof (req));
A more common way to initialise a new C<eio_req> is to use C<calloc>:
eio_req *req = calloc (1, sizeof (*req));
In either case, libeio neither allocates, initialises or frees the
C<eio_req> structure for you - it merely uses it.
zero
#TODO
=head2 CONFIGURATION
The functions in this section can sometimes be useful, but the default
configuration will do in most case, so you should skip this section on
first reading.
=over 4
=item eio_set_max_poll_time (eio_tstamp nseconds)
This causes C<eio_poll ()> to return after it has detected that it was
running for C<nsecond> seconds or longer (this number can be fractional).
This can be used to limit the amount of time spent handling eio requests,
for example, in interactive programs, you might want to limit this time to
C<0.01> seconds or so.
Note that:
=over 4
=item a) libeio doesn't know how long your request callbacks take, so the
time spent in C<eio_poll> is up to one callback invocation longer then
this interval.
=item b) this is implemented by calling C<gettimeofday> after each
request, which can be costly.
=item c) at least one request will be handled.
=back
=item eio_set_max_poll_reqs (unsigned int nreqs)
When C<nreqs> is non-zero, then C<eio_poll> will not handle more than
C<nreqs> requests per invocation. This is a less costly way to limit the
amount of work done by C<eio_poll> then setting a time limit.
If you know your callbacks are generally fast, you could use this to
encourage interactiveness in your programs by setting it to C<10>, C<100>
or even C<1000>.
=item eio_set_min_parallel (unsigned int nthreads)
Make sure libeio can handle at least this many requests in parallel. It
might be able handle more.
=item eio_set_max_parallel (unsigned int nthreads)
Set the maximum number of threads that libeio will spawn.
=item eio_set_max_idle (unsigned int nthreads)
Libeio uses threads internally to handle most requests, and will start and stop threads on demand.
This call can be used to limit the number of idle threads (threads without
work to do): libeio will keep some threads idle in preparation for more
requests, but never longer than C<nthreads> threads.
In addition to this, libeio will also stop threads when they are idle for
a few seconds, regardless of this setting.
=item unsigned int eio_nthreads ()
Return the number of worker threads currently running.
=item unsigned int eio_nreqs ()
Return the number of requests currently handled by libeio. This is the
total number of requests that have been submitted to libeio, but not yet
destroyed.
=item unsigned int eio_nready ()
Returns the number of ready requests, i.e. requests that have been
submitted but have not yet entered the execution phase.
=item unsigned int eio_npending ()
Returns the number of pending requests, i.e. requests that have been
executed and have results, but have not been finished yet by a call to
C<eio_poll>).
=back
=head1 EMBEDDING
Libeio can be embedded directly into programs. This functionality is not
documented and not (yet) officially supported.
Note that, when including C<libeio.m4>, you are responsible for defining
the compilation environment (C<_LARGEFILE_SOURCE>, C<_GNU_SOURCE> etc.).
If you need to know how, check the C<IO::AIO> perl module, which does
exactly that.
=head1 COMPILETIME CONFIGURATION
These symbols, if used, must be defined when compiling F<eio.c>.
=over 4
=item EIO_STACKSIZE
This symbol governs the stack size for each eio thread. Libeio itself
was written to use very little stackspace, but when using C<EIO_CUSTOM>
requests, you might want to increase this.
If this symbol is undefined (the default) then libeio will use its default
stack size (C<sizeof (void *) * 4096> currently). If it is defined, but
C<0>, then the default operating system stack size will be used. In all
other cases, the value must be an expression that evaluates to the desired
stack size.
=back
=head1 PORTABILITY REQUIREMENTS
In addition to a working ISO-C implementation, libeio relies on a few
additional extensions:
=over 4
=item POSIX threads
To be portable, this module uses threads, specifically, the POSIX threads
library must be available (and working, which partially excludes many xBSD
systems, where C<fork ()> is buggy).
=item POSIX-compatible filesystem API
This is actually a harder portability requirement: The libeio API is quite
demanding regarding POSIX API calls (symlinks, user/group management
etc.).
=item C<double> must hold a time value in seconds with enough accuracy
The type C<double> is used to represent timestamps. It is required to
have at least 51 bits of mantissa (and 9 bits of exponent), which is good
enough for at least into the year 4000. This requirement is fulfilled by
implementations implementing IEEE 754 (basically all existing ones).
=back
If you know of other additional requirements drop me a note.
=head1 AUTHOR
Marc Lehmann <libeio@schmorp.de>.
|